Clusterwiki
wikidb
https://uhhpc.herts.ac.uk//wiki/index.php/Main_Page
MediaWiki 1.34.0
first-letter
Media
Special
Talk
User
User talk
Clusterwiki
Clusterwiki talk
File
File talk
MediaWiki
MediaWiki talk
Template
Template talk
Help
Help talk
Category
Category talk
Main Page
0
1
1
2010-05-06T15:17:36Z
MediaWiki default
0
wikitext
text/x-wiki
<big>'''MediaWiki has been successfully installed.'''</big>
Consult the [http://meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
== Getting started ==
* [http://www.mediawiki.org/wiki/Manual:Configuration_settings Configuration settings list]
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
bd962048d95fbb6b6b514885867811db20a5476b
2
1
2010-05-06T15:20:28Z
WikiSysop
1
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
== Getting started with MediaWiki==
* How to use the software: [http://meta.wikimedia.org/wiki/Help:Contents User's Guide]
* [http://www.mediawiki.org/wiki/Manual:Configuration_settings Configuration settings list]
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
2e0f86b711880cf55c1714183c9ef4b4cf9c8a08
3
2
2010-05-06T15:51:50Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki will be the location for documentation for the cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster.
== Cluster basics ==
* [[Accounts]]
* [[Access]]
* [[Running jobs]]
== Getting started with MediaWiki==
* How to use the software: [http://meta.wikimedia.org/wiki/Help:Contents User's Guide]
* [http://www.mediawiki.org/wiki/Manual:Configuration_settings Configuration settings list]
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
6a3f9e53e7de69472eca6df1cead09aa4a86cac6
4
3
2010-05-06T15:54:20Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki will be the location for documentation for the cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster.
== Cluster basics ==
* [[Accounts]]
* [[Access]]
* [[Running jobs]]
== More esoteric ==
* [[MediaWiki]]
255c70add088f942a0ba35a563a4b2b1df1b73f6
7
4
2010-05-06T16:07:21Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki will be the location for documentation for the cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Running jobs]]
== More esoteric ==
* [[MediaWiki]]
2041e0a624c8006b8d4828ee81a2e71b1ceec93b
11
7
2010-05-06T16:25:44Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki will be the location for documentation for the cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Running jobs]]
* [[Administrators]] contact details
== More esoteric ==
* [[MediaWiki]]
ee8ce7bf3a8c476783b1860740d2d4ccb76b4513
16
11
2010-05-06T16:32:44Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki will be the location for documentation for the cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Jobs]]
* [[Administrators]] contact details
== More esoteric ==
* [[MediaWiki]]
490166b68cf9306d6ac052ab0a0f29ab1310aab7
22
16
2010-05-06T16:46:36Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki will be the location for documentation for the cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Jobs]]
* [[Administrators]]' contact details
== More esoteric ==
* [[MediaWiki]]
b8245a5aff8f5a437e219234ee0bf95301cad741
28
22
2010-05-06T18:58:02Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki will be the location for documentation for the cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
== More esoteric ==
* [[MediaWiki]]
601e85b3823b46132a87851ff368987ff89306dd
30
28
2010-05-06T19:23:37Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
183338807182c701d85970732c800983ecbd7f5f
48
30
2010-05-09T11:57:05Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
884b120e8cd5b100889bcec0711f1bd4771be274
MediaWiki
0
2
5
2010-05-06T15:54:41Z
Mjh
2
Create page with old data
wikitext
text/x-wiki
== Getting started with MediaWiki==
* How to use the software: [http://meta.wikimedia.org/wiki/Help:Contents User's Guide]
* [http://www.mediawiki.org/wiki/Manual:Configuration_settings Configuration settings list]
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
e4b16b14650bfe6e91f2e2b73541f565bfc03194
Accounts
0
3
6
2010-05-06T16:07:02Z
Mjh
2
Created page with '== Accounts == To get an account, speak to John Atkinson in E117C. Accounts are available to the following classes of people: * Members of the Centre for Astrophysics Research…'
wikitext
text/x-wiki
== Accounts ==
To get an account, speak to John Atkinson in E117C.
Accounts are available to the following classes of people:
* Members of the Centre for Astrophysics Research (CAR)
* Members of the Centre for Atmospheric & Instrumentation Research
* Members of the School of Computer Science
* Others, by special arrangement.
Access is granted subject to observance of our usage [[policies]].
7cdf3522684ddac8764bba6d66ed04b76f9ba820
19
6
2010-05-06T16:37:24Z
Mjh
2
wikitext
text/x-wiki
To get an account, speak to John Atkinson in E117C.
Accounts are available to the following classes of people:
* Members of the Centre for Astrophysics Research (CAR)
* Members of the Centre for Atmospheric & Instrumentation Research
* Members of the School of Computer Science
* Others, by special arrangement.
Access is granted subject to observance of our usage [[policies]].
e3c2f38b6d652146226b4cd64b711e6acb1a2be9
29
19
2010-05-06T19:20:18Z
Mjh
2
wikitext
text/x-wiki
To get an account, speak to John Atkinson in E117C.
Accounts are available to the following classes of people:
* Members of the Centre for Astrophysics Research (CAR)
* Members of the Centre for Atmospheric & Instrumentation Research (CAIR)
* Members of the School of Computer Science (CS)
* Others, by special arrangement; restricted to those who have made a financial contribution to the cluster.
Access is granted subject to observance of our usage [[policies]].
e56ad6b08b401d7fe037da119409d8179c291d43
Policies
0
4
8
2010-05-06T16:08:47Z
Mjh
2
Created page with '== Policies == The cluster is by design a shared resource. In using it you must be considerate of other users.'
wikitext
text/x-wiki
== Policies ==
The cluster is by design a shared resource. In using it you must be considerate of other users.
c56586e982b6c75c406abd6f933f265dfd36b4cd
20
8
2010-05-06T16:45:45Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must never be used for computation on a larger scale than a few minutes' testing. It is the file server for the whole cluster and all user logins have to pass through it.
* The normal method of using the compute nodes is by way of the [[batch queuing system|jobs]]. You should not log in to the compute nodes directly unless you have been given specific authorization to do so (e.g. if you have got approval to dedicate one or more nodes to data reduction tasks).
* When using the batch queuing system you must honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
3041a06060bedadbd2965ffc8b96f57ffb865411
21
20
2010-05-06T16:46:10Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must never be used for computation on a larger scale than a few minutes' testing. It is the file server for the whole cluster and all user logins have to pass through it.
* The normal method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You should not log in to the compute nodes directly unless you have been given specific authorization to do so (e.g. if you have got approval to dedicate one or more nodes to data reduction tasks).
* When using the batch queuing system you must honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
8767b2d0b1fb6776efe0818b36291d997e14f9fa
47
21
2010-05-09T11:56:16Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing.
* The normal method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly unless you have been given specific authorization to do so (e.g. if you have got approval to dedicate one or more nodes to data reduction tasks).
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
6c16bc37740e2a13d32cc48c944e21459318d693
Access
0
5
9
2010-05-06T16:24:00Z
Mjh
2
Created page with '== Access == The [[architecture|head node]] of the cluster is accessible from within the university by ssh to stri-cluster.herts.ac.uk, once you have an [[accounts|account]] set…'
wikitext
text/x-wiki
== Access ==
The [[architecture|head node]] of the cluster is accessible from within the university by ssh to stri-cluster.herts.ac.uk, once you have an [[accounts|account]] set up.
Currently it is not possible to access the head node from outside the University network; if you need this please discuss with the [[administrators]].
3c1e5c55469f4c9248d912129d0779ba0eed57e5
15
9
2010-05-06T16:32:18Z
Mjh
2
wikitext
text/x-wiki
== Access ==
The [[architecture|head node]] of the cluster is accessible from within the university by ssh to stri-cluster.herts.ac.uk, once you have an [[accounts|account]] set up.
Currently it is not possible to access the head node from outside the University network; if you need this please discuss with the [[administrators]].
Individual compute nodes must be accessed via the head node: see also the [[policies|policy]] relating to this.
070e8c519fda2bcc57cb9cb83b6e9b94ba1f302e
Administrators
0
6
10
2010-05-06T16:24:57Z
Mjh
2
Created page with '== Administrators == These are currently: * John Atkinson, j.atkinson@herts.ac.uk * Martin Hardcastle, m.j.hardcastle@herts.ac.uk Contact us with queries.'
wikitext
text/x-wiki
== Administrators ==
These are currently:
* John Atkinson, j.atkinson@herts.ac.uk
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk
Contact us with queries.
3be9f73f8ea84108bdeefb64f1903aa75ef83e93
12
10
2010-05-06T16:26:31Z
Mjh
2
wikitext
text/x-wiki
== Administrators ==
These are currently:
* John Atkinson, j.atkinson@herts.ac.uk (x3358, room E117C)
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk (x3409, room 1E116).
Contact us with queries.
439d31c89a0099465f042c4c1c948f2ebf19f3cf
23
12
2010-05-06T16:47:27Z
Mjh
2
wikitext
text/x-wiki
== Administrators ==
These are currently:
* John Atkinson, j.atkinson@herts.ac.uk (x3358, room E117C)
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk (x3409, room 1E116).
Contact us with queries.
[[File:m.jpg]]
eafb89ab072cedfeab448de9a98ebdc7f04769ef
24
23
2010-05-06T16:47:44Z
Mjh
2
wikitext
text/x-wiki
== Administrators ==
These are currently:
* John Atkinson, j.atkinson@herts.ac.uk (x3358, room E117C)
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk (x3409, room 1E116).
Contact us with queries.
439d31c89a0099465f042c4c1c948f2ebf19f3cf
Architecture
0
7
13
2010-05-06T16:30:42Z
Mjh
2
Created page with '== Architecture == The cluster consists of * a head node, which is an 8-core Xeon-based machine with 32 Gb RAM * 80 compute nodes (or just 'nodes'), of which ** 48 8-core Xeons…'
wikitext
text/x-wiki
== Architecture ==
The cluster consists of
* a head node, which is an 8-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 8-core Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster
** 32 8-core Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster
* 110 Tb of storage attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
[[Networking]] details are described elsewhere.
f552909bf1d8cb4b1ce26bb5d7f0acb121997a82
14
13
2010-05-06T16:31:19Z
Mjh
2
wikitext
text/x-wiki
== Architecture ==
The cluster consists of
* a head node, which is an 8-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 8-core Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster
** 32 8-core Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
[[Networking]] details are described elsewhere.
38e0f4b999ec0b49a24fd1832da11d1490995921
18
14
2010-05-06T16:37:10Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 8-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 8-core Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster
** 32 8-core Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
[[Networking]] details are described elsewhere.
b3f4a3330c770947828c92330a28cc6981214c6c
27
18
2010-05-06T18:56:04Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 8-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 8-core Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster
** 32 8-core Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
0aa8d65123631849e292588e1521cf84288fb96c
Storage
0
8
17
2010-05-06T16:36:50Z
Mjh
2
Created page with 'The '''storage''' available on the cluster is set up as follows: * 1 Tb of user home directories, mounted as /home * 65 Tb of scratch available to all users, mounted as /stri-da…'
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 65 Tb of scratch available to all users, mounted as /stri-data
* 40 Tb of scratch for CAIR users only, mounted as /cair-data
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large astronomical datasets will be being processed on the cluster. (See also [[policies]].)
4f3656586c0b7f9c2f39aa76084cfec8f0e19529
Jobs
0
9
25
2010-05-06T17:04:33Z
Mjh
2
Created page with 'The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] and [htt…'
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
24ade543a4db770d6f99a62868971e1dc0504e92
34
25
2010-05-07T14:23:25Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis (see [[networking]]).
b72ae512f593f34adbabea9b64d5a4d2df39a9e2
35
34
2010-05-07T18:48:11Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You can only view your own jobs using qstat; however, the MAUI tool <tt>showq</tt> gives a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
410c741570a279c971ff51ae59eaa1a2db48f2f4
36
35
2010-05-07T19:28:07Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You can only view your own jobs using qstat; however, the MAUI tool <tt>showq</tt> gives a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pdsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
67c838a6d5c7ec067292c3096aba44989f7e54a0
37
36
2010-05-08T13:34:44Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) You can only view your own jobs using qstat; however, the MAUI tool <tt>showq</tt> gives a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pdsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
c0cf30f71fd946cbe4da1460c2d48a1a132a1918
38
37
2010-05-09T08:47:21Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) You can only view your own jobs using qstat; however, the MAUI tool <tt>showq</tt> gives a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
This command would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
39b2c132c364b69a1f52dd95554de05f4cef4304
42
38
2010-05-09T11:07:48Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) You can only view your own jobs using qstat; however, the MAUI tool <tt>showq</tt> gives a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
1f35afc5a24981313d6f0ce37a84f7f1c69d9a2c
Networking
0
10
26
2010-05-06T18:53:40Z
Mjh
2
Created page with 'The nodes are linked by Gigabit ethernet and Infiniband networks. The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet…'
wikitext
text/x-wiki
The nodes are linked by Gigabit ethernet and Infiniband networks.
The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet switch to which all nodes in that chassis are connected. These switches, together with the head node, are connected together via a main ethernet switch. Management traffic also uses this switch.
There are in fact two infiniband networks: one for the main cluster, which is dual data-rate (nominal 2.5 Gbit/s) and one for the CAIR cluster, which is quad data-rate (5.0 Gbit/s). As for the ethernet, each chassis has an internal Infiniband switch. The two sub-clusters each have an external Infiniband switch which connects the chassis that make up the cluster and the head node is connected, separately, to both of those switches. Therefore there is no native infiniband connectivity between the two sub-clusters. IP over Infiniband can be used to connect between them, but this should not be relied on for large data volumes as it must be routed via the head node.
The networking arrangements should be taken into consideration when running jobs where fast communication is important -- latency and data transfer rates are somewhat higher between nodes in the same chassis than between different chassis in the same cluster, and ethernet connections between the two sub-clusters are higher-latency and lower-bandwidth still, while native Infiniband connections between the two sub-clusters are essentially impossible. Best results will be obtained running jobs within a single chassis.
None of the nodes are on the public Internet: they all have private IP addresses starting 192.168.... These addresses are not routable from anywhere else in the University. They can, however, all see the public Internet via IP masquerading through the head node through the ethernet network. (For this reason, it is not sensible to try to make high-volume accesses to the Internet from the nodes.)
Each node has an IP address corresponding to its ethernet port (192.168.2.xxx where xxx is the node number) and one corresponding to the Infiniband port (192.168.3.xxx or 4.xxx, where the 3 and 4 refer to the main and CAIR clusters respectively). Within the cluster, you may use nodexxx.data (e.g. node001.data) and nodexxx.infi (node001.infi) to refer to these two networks. High-volume traffic should use the Infiniband network.
f8d84ed07663bde5f5bcbdb5e983101974fab77f
Clusterwiki:About
4
11
31
2010-05-06T19:24:23Z
Mjh
2
Created page with 'This wiki documents the STRI cluster.'
wikitext
text/x-wiki
This wiki documents the STRI cluster.
7b631f7f0c1d2c5642063355147d292172d16a3e
32
31
2010-05-06T19:25:02Z
Mjh
2
wikitext
text/x-wiki
This wiki documents the STRI cluster.
It uses [[Mediawiki]] and runs under Linux on the cluster head node.
9d32ed9b7e49666fd0bc9610a4efbaa2226173ee
33
32
2010-05-06T19:25:19Z
Mjh
2
wikitext
text/x-wiki
This wiki documents the STRI cluster.
It uses [[MediaWiki]] and runs under Linux on the cluster head node.
08be33541d11f18a681e80ec16541a2c39c7fbd7
MPI
0
12
39
2010-05-09T10:51:52Z
Mjh
2
Created page with '== What is MPI? == MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Pr…'
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that.
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
== MVAPICH2 ==
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command mpi-selector-menu on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
MVAPICH2 integration with Torque is not as good as for MPICH2. A script <tt>/soft/bin/torque-mv</tt>
8961e4575964141945840229e3e2351b15c45ea5
40
39
2010-05-09T11:00:40Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that.
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
== MVAPICH2 ==
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command mpi-selector-menu on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
MVAPICH2 integration with Torque is not as good as for MPICH2 (at present <tt>mpiexec</tt> does not work, though it's supposed to). A script <tt>/soft/bin/torque-mv</tt> has been provided as a partial replacement for <tt>mpiexec</tt>; this uses the ssh-based <tt>mpirun_rsh</tt> tool to start jobs on all appropriate machines. You will need [[passwordless ssh]] set up for this to work.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/soft/bin/torque-mv /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
4232a4110833461c79bad5e89960b3de46ef468c
41
40
2010-05-09T11:01:11Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that.
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command mpi-selector-menu on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
MVAPICH2 integration with Torque is not as good as for MPICH2 (at present <tt>mpiexec</tt> does not work, though it's supposed to). A script <tt>/soft/bin/torque-mv</tt> has been provided as a partial replacement for <tt>mpiexec</tt>; this uses the ssh-based <tt>mpirun_rsh</tt> tool to start jobs on all appropriate machines. You will need [[passwordless ssh]] set up for this to work.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/soft/bin/torque-mv /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
1fc6554883a752301c119627f1bd3ef678dd8a04
45
41
2010-05-09T11:31:23Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that.
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command mpi-selector-menu on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
MVAPICH2 integration with Torque is not as good as for MPICH2 (at present <tt>mpiexec</tt> does not work, though it's supposed to). A script <tt>/soft/bin/torque-mv</tt> has been provided as a partial replacement for <tt>mpiexec</tt>; this uses the ssh-based <tt>mpirun_rsh</tt> tool to start jobs on all appropriate machines. You will need [[passwordless ssh]] set up for this to work.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/soft/bin/torque-mv /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
f4169e7686c1a21e8ab9fc5b6e5eb25d8ba0be91
Passwordless ssh
0
13
43
2010-05-09T11:18:20Z
Mjh
2
Created page with 'For some applications you will need to enable passwordless ssh between nodes. The simplest way of doing this is as follows: * run <tt>ssh-keygen</tt> and generate a key with no…'
wikitext
text/x-wiki
For some applications you will need to enable passwordless ssh between nodes.
The simplest way of doing this is as follows:
* run <tt>ssh-keygen</tt> and generate a key with no passphrase (just press return when prompted).
* cd into your <tt>~/.ssh</tt> directory.
* <tt>cat id_rsa.pub >> authorized_keys</tt>
* To get your <tt>known_hosts</tt> file filled up correctly, do <tt>pdsh -f 1 -w 'node[001-080]' hostname</tt>. The first time you do this, you should see a bunch of messages about files being added to <tt>known_hosts</tt>. If you then do it again, you should just see the hostnames of all the nodes appearing in order.
* Passwordless ssh is now set up.
2279d37263a9aaaa565e0a9c116ed78f5d07173a
44
43
2010-05-09T11:28:29Z
Mjh
2
wikitext
text/x-wiki
For some applications you will need to enable passwordless ssh between nodes.
The simplest way of doing this is as follows:
* run <tt>ssh-keygen</tt> and generate a key with no passphrase (just press return when prompted).
* cd into your <tt>~/.ssh</tt> directory.
* <tt>cat id_rsa.pub >> authorized_keys</tt>
* Passwordless ssh is now set up: try e.g. <tt>ssh node001 hostname</tt> to test it.
db879b0ce64cbfbd57cc94791948aeddf50113de
Parallelization
0
14
46
2010-05-09T11:55:05Z
Mjh
2
Created page with 'It is very important to realise that running a job on the cluster does not automatically give you access to all the CPU resources of the cluster (or even of the subset of the clu…'
wikitext
text/x-wiki
It is very important to realise that running a job on the cluster does not automatically give you access to all the CPU resources of the cluster (or even of the subset of the cluster you request from the [[jobs|job control system]]).
There is one, and only one, situation in which you can run normal, single-threaded code on the cluster and get an improvement in performance. That is the situation in which ''you can usefully run multiple copies of the same program simultaneously without any communication between the different copies''; in other words, you are only limited by how many CPUs you can get access to. Examples might include Monte Carlo simulation or some data analysis tasks. Problems of this kind are called ''embarrassingly parallel''. Provided that your code is thread-safe -- that is, it doesn't have implementation errors that prevent it being run several times simultaneously, such as use of temporary files with the same name --- you can use the cluster for this sort of problem without modifying your code. You may be able to use the job control system with commands such as <tt>pbsdsh</tt>, or you may need to request that a node be dedicated to your task.
Once you need the different parts of your code to communicate, you in general ''must'' use code that is written specifically to do so. At present [[MPI]] is the method provided to do this. If you are planning to run an application written by a third party, it needs to use [[MPI]] if you expect to be able to run on more than one node. If you are running your own code, you will need to modify it, often very substantially, to allow parallel execution. '''This is your responsibility, not that of the cluster administrators.'''
7cc2fed98d405f0817c8269b1edebeed022d5080
Queues
0
15
49
2010-05-09T12:26:56Z
Mjh
2
Created page with 'There are four possible job queues available on the system: * 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on th…'
wikitext
text/x-wiki
There are four possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main
cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a
maximum wall time of 6 hours.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* Finally 'all' submits to all 80 nodes, with a maximum wall time of 2 hours; note that MPI over Infiniband will not work if you request nodes spanning both the main and CAIR clusters.
1629bba541d9767351ce758bb610ccb8c85c4138
50
49
2010-05-09T12:27:13Z
Mjh
2
wikitext
text/x-wiki
There are four possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* Finally 'all' submits to all 80 nodes, with a maximum wall time of 2 hours; note that MPI over Infiniband will not work if you request nodes spanning both the main and CAIR clusters.
a8c0b319e35e9b61d71396298a24578ceeecca0a
Jobs
0
9
51
42
2010-05-09T16:49:30Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
c9f53a41c61d606b569a8f75c912a9d06c5cdf4b
63
51
2010-05-20T15:15:15Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
1149f7c7b5ea985959aa401f95ea77275d1ecb66
64
63
2010-05-28T15:03:17Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
0e9b955029551149114f96d135f26e9b2e782e9a
65
64
2010-05-28T15:07:24Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
906bd3c417e87d3d6f9f518164e5d4bad9afa7eb
68
65
2010-05-28T15:14:43Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
48c1d3bd6e29f499b979bc0ef41a591acf4df5bc
83
68
2010-06-11T10:58:07Z
Mjh
2
add multiple job info
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
7ba8853148192b37f5d8de1e4203b19bd891593b
84
83
2010-06-11T11:00:14Z
Mjh
2
/* Basic commands */
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated.
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
bb406a0f142971181dd3709c8ca258b5e6d0d56e
86
84
2010-06-17T07:08:18Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
bb214ac8329c548df6ebb1da6e806df033530a4f
Main Page
0
1
52
48
2010-05-14T13:58:51Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
4a24fd4b4306aa294dfda7a047a996dcc7c32474
54
52
2010-05-17T10:47:38Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
94eeae5af50a6e24c5dc201c3a32b8a2e46d8ee5
58
54
2010-05-17T13:50:44Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
60e9f968a172950c4e2c40a1f35e607adac2ac9b
69
58
2010-05-30T07:41:55Z
Mjh
2
/* Using the cluster */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
3d8adda5c9b09716753401f77941e2b434256463
Compilers
0
16
53
2010-05-14T13:59:48Z
Mjh
2
Created page with 'The standard C and Fortran compilers are <tt>gcc</tt> and <tt>gfortran</tt>. The Intel compilers <tt>icc</tt> and <tt>ifort</tt> are also available and may be superior for many a…'
wikitext
text/x-wiki
The standard C and Fortran compilers are <tt>gcc</tt> and <tt>gfortran</tt>. The Intel compilers <tt>icc</tt> and <tt>ifort</tt> are also available and may be superior for many applications.
7c81ae718d3204ab98d6cc3b0751f6affbde75a4
Software
0
17
55
2010-05-17T10:53:48Z
Mjh
2
Created page with 'This page documents locations of software. Detailed local documentation of software should go in a page specific to that software. * gromacs: 4.0.7 installed in <tt>/soft/groma…'
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* gromacs: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock Vina: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86
* Autodock: 4.2 installed in <tt>/soft/autodock</tt>
0b771b81b012748ea7e9647d2350f35b8d8ed34d
56
55
2010-05-17T10:53:58Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* gromacs: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock Vina: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* Autodock: 4.2 installed in <tt>/soft/autodock</tt>
fe3ef45623b85fa43e8b1575d5778da02d82d1b1
57
56
2010-05-17T11:12:36Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* gromacs: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock Vina: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* Autodock: 4.2 installed in <tt>/soft/autodock</tt>
* iGemDock: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
457ad71a3e4c7591a5f6bd246ea428ffc2ea3dc5
79
57
2010-06-02T09:25:30Z
Akukol
3
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* [[Gromacs]]: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock Vina: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* Autodock: 4.2 installed in <tt>/soft/autodock</tt>
* iGemDock: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
fa8b545852c44b542e221afcd777279f23362bb2
80
79
2010-06-02T09:31:12Z
Akukol
3
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]<\u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock Vina: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* Autodock: 4.2 installed in <tt>/soft/autodock</tt>
* iGemDock: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
8659943ecf8b773bae944bb2505e9dffabb0117e
81
80
2010-06-02T09:31:33Z
Akukol
3
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock Vina: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* Autodock: 4.2 installed in <tt>/soft/autodock</tt>
* iGemDock: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
7f78613aad13b44a2fbde9dde9d84105e3591458
MPI
0
12
59
45
2010-05-18T11:27:58Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that.
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command mpi-selector-menu on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
MVAPICH2 integration with Torque is not as good as for MPICH2 (at present <tt>mpiexec</tt> does not work, though it's supposed to). A script <tt>/soft/bin/torque-mv</tt> has been provided as a partial replacement for <tt>mpiexec</tt>; this uses the ssh-based <tt>mpirun_rsh</tt> tool to start jobs on all appropriate machines. You will need [[passwordless ssh]] set up for this to work.
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/soft/bin/torque-mv /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
2fdc78229b8ba6c14f2ab6eb0488f9d90c653d29
60
59
2010-05-18T12:54:41Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command mpi-selector-menu on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
MVAPICH2 integration with Torque is not as good as for MPICH2 (at present <tt>mpiexec</tt> does not work, though it's supposed to). A script <tt>/soft/bin/torque-mv</tt> has been provided as a partial replacement for <tt>mpiexec</tt>; this uses the ssh-based <tt>mpirun_rsh</tt> tool to start jobs on all appropriate machines. You will need [[passwordless ssh]] set up for this to work.
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/soft/bin/torque-mv /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
bc30c85ef90578024715a565be2c90bdc36e7538
Architecture
0
7
61
27
2010-05-19T15:21:12Z
WikiSysop
1
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 8-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 8-core Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster
** 32 8-core Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
26fa825472372d8a5b67081c276dd53f73e16df5
62
61
2010-05-20T15:11:27Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 8-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 8-core Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 8-core Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
7b66f021aeee6413bca6858eef7680da2c0a6390
Passwordless ssh
0
13
66
44
2010-05-28T15:13:11Z
Mjh
2
wikitext
text/x-wiki
For some applications (including use of the job submission system) you will need to enable passwordless ssh between nodes.
The simplest way of doing this is as follows:
* run <tt>ssh-keygen</tt> and generate a key with no passphrase (just press return when prompted).
* cd into your <tt>~/.ssh</tt> directory.
* <tt>cat id_rsa.pub >> authorized_keys</tt>
* Passwordless ssh is now set up: try e.g. <tt>ssh node001 hostname</tt> to test it.
72fd93ffcb7a5e2021fea019daf8a5ef78c980fb
67
66
2010-05-28T15:13:37Z
Mjh
2
wikitext
text/x-wiki
For some applications (including use of the [[jobs|job submission system]]) you will need to enable passwordless ssh between nodes.
The simplest way of doing this is as follows:
* run <tt>ssh-keygen</tt> and generate a key with no passphrase (just press return when prompted).
* cd into your <tt>~/.ssh</tt> directory.
* <tt>cat id_rsa.pub >> authorized_keys</tt>
* Passwordless ssh is now set up: try e.g. <tt>ssh node001 hostname</tt> to test it.
a72bcb9c5d294396d9365461bda127b1c51e5c93
71
67
2010-06-01T09:08:13Z
Mjh
2
wikitext
text/x-wiki
For some applications (including use of the [[jobs|job submission system]]) you will need to enable passwordless ssh between nodes.
The simplest way of doing this is as follows:
* run <tt>ssh-keygen</tt> and generate a key with no passphrase (just press return when prompted).
* cd into your <tt>~/.ssh</tt> directory.
* <tt>cat id_rsa.pub >> authorized_keys</tt>
* Passwordless ssh is now set up: try e.g. <tt>ssh node001 hostname</tt> to test it.
Note that you are ''not'' permitted to use this to run jobs on the nodes: see [[Policies]] for more.
bb4735e1cb4955873784c3e577f22f337a7ec656
Mail
0
18
70
2010-05-30T07:45:48Z
Mjh
2
Created page with 'Various systems on the cluster will want to send you e-mail. By default this will be delivered to a local mailbox: you will need to read it with a command-line mail tool like <tt…'
wikitext
text/x-wiki
Various systems on the cluster will want to send you e-mail. By default this will be delivered to a local mailbox: you will need to read it with a command-line mail tool like <tt>mutt</tt> on the head node.
You are advised to set up a <tt>.forward</tt> file which will send it to your normal inbox. To do this, simply create the file containing one line of text, the e-mail address to forward to:
<pre>
cat <<END >.forward
f.bloggs@herts.ac.uk
END
</pre>
Please don't allow your inbox on the cluster to fill up with large messages.
d3cb935f23039bce1ac1d56bd2b0dcd4b2c39b43
Gromacs
0
19
72
2010-06-02T09:11:35Z
Akukol
3
Run Gromacs
wikitext
text/x-wiki
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below.
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each nodes has 8 CPU cores), the working directory and the details of the mdrun command.
--------------
#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -u akukol
# runs a job with name GromacsTest on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# merge 'output' and 'standard error' and output both to 'standard output'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
0cb22746efc497b15d1f65b26f41c7e970e885ad
73
72
2010-06-02T09:16:24Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below.
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each nodes has 8 CPU cores), the working directory and the details of the mdrun command.
--------------
&#!/bin/sh
&#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -u akukol
# runs a job with name GromacsTest on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# merge 'output' and 'standard error' and output both to 'standard output'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
5260f556b2a44c46c12e244be839b0714d616b09
74
73
2010-06-02T09:17:39Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below.
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each nodes has 8 CPU cores), the working directory and the details of the mdrun command.
--------------
<nowiki>#!/bin/sh
<nowiki>#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -u akukol
# runs a job with name GromacsTest on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# merge 'output' and 'standard error' and output both to 'standard output'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
9a66022d9cadcb0af31471a5dcacda2ab8a35c15
75
74
2010-06-02T09:18:14Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below.
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each nodes has 8 CPU cores), the working directory and the details of the mdrun command.
--------------
<nowiki>
#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -u akukol
# runs a job with name GromacsTest on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# merge 'output' and 'standard error' and output both to 'standard output'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
2c856e437dd24cb0503971f5e0a8727e4831a97b
76
75
2010-06-02T09:19:20Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below.
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each nodes has 8 CPU cores), the working directory and the details of the mdrun command.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -u akukol
# runs a job with name GromacsTest on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# merge 'output' and 'standard error' and output both to 'standard output'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
5097f80631b90487df0a4a78ff1cbf0a2098c660
77
76
2010-06-02T09:20:40Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below.
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the working directory and the details of the mdrun command.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -u akukol
# runs a job with name GromacsTest on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# merge 'output' and 'standard error' and output both to 'standard output'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
ab4e045557e0d2fde183acfba81b19c863eedafd
78
77
2010-06-02T09:21:32Z
Akukol
3
gromacs
wikitext
text/x-wiki
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below.
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the working directory and the details of the mdrun command.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -u akukol
# runs a job with name GromacsTest on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# merge 'output' and 'standard error' and output both to 'standard output'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
6cd16cdb43ca951d1d94ac04c4918bd04dfa19fa
82
78
2010-06-02T09:38:45Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
Gromacs is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below.
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the working directory and the details of the mdrun command.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -j oe
#PBS -u akukol
# runs a job with name GromacsTest on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# merge 'output' and 'standard error' and output both to 'standard output'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
648e371a84eb3e8678d696f5c8e8f87807042058
88
82
2010-06-17T08:34:36Z
Akukol
3
wikitext
text/x-wiki
Gromacs is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[jobs:Here are all the options explained.|Here are all the options explained.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default [[Queues:walltime|walltime]] on the 'main' queue is 24 hours. If you don't specify any walltime, the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
45ed455d237e8daa2165784d2acfd0aa3fa4b37b
89
88
2010-06-17T08:35:20Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
Gromacs is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs:Here are all the options explained.|Here are all the options explained.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default [[Queues:walltime|walltime]] on the 'main' queue is 24 hours. If you don't specify any walltime, the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
d9c62ebc1848c9b3abaa4e6834b06fcd95c5bdbc
90
89
2010-06-17T08:37:37Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
Gromacs is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. More explanations: [[Jobs]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default [[walltime]] on the 'main' queue is 24 hours. If you don't specify any walltime, the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
2a4e69b1e9052372c0dd001aa4605b7f93e11302
91
90
2010-06-17T08:39:21Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
Gromacs is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. More explanations: [[Jobs]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any walltime, the job will stop after 24 hours. More info about walltimes: [[Queues]]
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
37f274a3a5ecb7abcb21ba75a22bcf155c05e3fb
92
91
2010-06-17T08:41:36Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
Gromacs is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. More explanations: [[Jobs|More explanations.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any walltime, the job will stop after 24 hours. More info about walltimes: [[Queues]]
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
a57cd6222dc99204d7cc11d5cbf0bc37b29fcce4
93
92
2010-06-17T08:42:53Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
Gromacs is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
4582024133ceefbcfa4c5d94ff64afdb65d8614c
94
93
2010-06-17T08:47:33Z
Akukol
3
wikitext
text/x-wiki
[http://www.gromacs.org/ Gromacs]is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
ec9848b95954370827db4a358b45c49dfa3a3ab3
95
94
2010-06-17T08:47:44Z
Akukol
3
wikitext
text/x-wiki
[http://www.gromacs.org/ Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
e5b3df18473a441ceb762a1e034757d592928b07
96
95
2010-06-17T08:49:40Z
Akukol
3
wikitext
text/x-wiki
[*http://www.gromacs.org/ Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
a013927385dff87a6fd36361b1d3df3eb978596a
97
96
2010-06-17T08:52:06Z
Akukol
3
wikitext
text/x-wiki
<a href="http://www.gromacs.org" target="_blank">Gromacs</a> is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
34556be0c275cfc87bea737f60b1f15823b2a15d
98
97
2010-06-17T08:53:02Z
Akukol
3
wikitext
text/x-wiki
[[http://www.gromacs.org Gromacs]] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
555168f1e46353e4d9b19cb9ea7e9cba13e9dba6
99
98
2010-06-17T08:53:13Z
Akukol
3
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
cad460a95ec1b44a746cefc4c690a2f80eecad25
100
99
2010-06-17T09:24:15Z
Akukol
3
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
Look here for [[groperform|optimising performance]].
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
a83332b470d0d51f119cd67a408d30d106530ced
Queues
0
15
85
50
2010-06-17T07:06:59Z
Mjh
2
wikitext
text/x-wiki
There are four possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* Finally 'all' submits to all 80 nodes, with a maximum wall time of 2 hours; note that MPI over Infiniband will not work if you request nodes spanning both the main and CAIR clusters.
== Default wall times ==
The default wall time limitation for the 'all' and 'cair_s' queues is also the maximum, i.e. 6 hours and 2 hours respectively.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
718dcd07c4345d1e8e241ad27decd2e560e0349e
87
85
2010-06-17T07:19:45Z
Mjh
2
wikitext
text/x-wiki
There are four possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* Finally 'all' submits to all 80 nodes, with a maximum wall time of 2 hours; note that MPI over Infiniband will not work if you request nodes spanning both the main and CAIR clusters. This queue should only be used in unusual circumstances -- normally it will be better to use one of the others.
== Default wall times ==
The default wall time limitation for the 'all' and 'cair_s' queues is also the maximum, i.e. 6 hours and 2 hours respectively.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
4cca22adbe2ec2a3d819b36044cbc5fe396bee3b
Groperform
0
20
101
2010-06-17T09:45:18Z
Akukol
3
Created page with '== '''How to optimise the performance of Gromacs on the cluster''' == 1) Perform a short simulation (100 ps) using the same number of nodes as you would for the long simulation.…'
wikitext
text/x-wiki
== '''How to optimise the performance of Gromacs on the cluster''' ==
1) Perform a short simulation (100 ps) using the same number of nodes as you would for the long simulation.
2) Analyse the output of mdrun:
For example
-----
Will use 10 particle-particle and 6 PME only nodes
This is a guess, check the performance at the end of the log file
..
..
Average load imbalance: 17.3 %
Part of the total run time spent waiting due to load imbalance: 8.3 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 9 %
Average PME mesh/force load: 1.370
Part of the total run time spent waiting due to PP/PME imbalance: 14.6 %
-----
3) If you use more than 12 cores, you should optimise the number of PME only nodes/cores, by using the -npme option of mdrun. For the example above try '-npme 8'. The number of PME only nodes cannot be larger than half the total number of nodes.
4) If the energy file is not required for further analysis, the option -nosum can be used.
Note that all above options are mainly relevant, when using more than one node (more than 8 cores) in order to optimise the comunication between nodes.
9bf13da0acde56145ebbf5d5de4ff53205b329b1
102
101
2010-06-17T09:48:33Z
Akukol
3
/* How to optimise the performance of Gromacs on the cluster */
wikitext
text/x-wiki
== '''How to optimise the performance of Gromacs on the cluster''' ==
1) Perform a short simulation (100 ps) using the same number of nodes as you would for the long simulation.
2) Analyse the output of mdrun:
For example
<pre>
Will use 10 particle-particle and 6 PME only nodes
This is a guess, check the performance at the end of the log file
...
...
Average load imbalance: 17.3 %
Part of the total run time spent waiting due to load imbalance: 8.3 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 9 %
Average PME mesh/force load: 1.370
Part of the total run time spent waiting due to PP/PME imbalance: 14.6 %
<\pre>
3) If you use more than 12 cores, you should optimise the number of PME only nodes/cores, by using the -npme option of mdrun. For the example above try '-npme 8'. The number of PME only nodes cannot be larger than half the total number of nodes.
4) If the energy file is not required for further analysis, the option -nosum can be used.
Note that all above options are mainly relevant, when using more than one node (more than 8 cores) in order to optimise the comunication between nodes.
291bd607ffa95ff0b2ec1f5995968ab0656b4f48
103
102
2010-06-17T09:50:17Z
Akukol
3
/* How to optimise the performance of Gromacs on the cluster */
wikitext
text/x-wiki
== '''How to optimise the performance of Gromacs on the cluster''' ==
1) Perform a short simulation (100 ps) using the same number of nodes as you would for the long simulation.
2) Analyse the output of mdrun:
For example
<pre>
Will use 10 particle-particle and 6 PME only nodes
This is a guess, check the performance at the end of the log file
...
...
Average load imbalance: 17.3 %
Part of the total run time spent waiting due to load imbalance: 8.3 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 9 %
Average PME mesh/force load: 1.370
Part of the total run time spent waiting due to PP/PME imbalance: 14.6 %
</pre>
3) If you use more than 12 cores, you should optimise the number of PME only nodes/cores, by using the -npme option of mdrun. For the example above try '-npme 8'. The number of PME only nodes cannot be larger than half the total number of nodes.
4) If the energy file is not required for further analysis, the option -nosum can be used.
Note that all above options are mainly relevant, when using more than one node (more than 8 cores) in order to optimise the comunication between nodes.
8f430c9fbb311dfec9cfec55ff578293d3ba32cb
104
103
2010-06-17T09:51:13Z
Akukol
3
wikitext
text/x-wiki
== '''How to optimise the performance of Gromacs on the cluster''' ==
1) Perform a short simulation (100 ps) using the same number of nodes as you would for the long simulation.
2) Analyse the output of mdrun:
For example
<pre>
Will use 10 particle-particle and 6 PME only nodes
This is a guess, check the performance at the end of the log file
...
...
Average load imbalance: 17.3 %
Part of the total run time spent waiting due to load imbalance: 8.3 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 9 %
Average PME mesh/force load: 1.370
Part of the total run time spent waiting due to PP/PME imbalance: 14.6 %
</pre>
3) If you use more than 12 cores, you should optimise the number of PME only nodes/cores, by using the -npme option of mdrun. For the example above try '-npme 8'. The number of PME only cores cannot be larger than half the total number of cores.
4) If the energy file is not required for further analysis, the option -nosum can be used.
Note that all above options are mainly relevant, when using more than one node (more than 8 cores) in order to optimise the comunication between nodes.
4ddbfacf70de9af0bd10be688e4c154e268df62e
Gromacs
0
19
105
100
2010-06-17T10:07:12Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
Look here for [[groperform|optimising performance]].
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs/bin/GMXRC
export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
7250620a34cc962a37fc0b1548ca3bacb299df34
Accounts
0
3
106
29
2010-06-17T11:37:47Z
Mjh
2
wikitext
text/x-wiki
To get an account, speak to John Atkinson in E117C.
Accounts are available to the following classes of people:
* Members of the Centre for Astrophysics Research (CAR)
* Members of the Centre for Atmospheric & Instrumentation Research (CAIR)
* Other research-active members of the School of Physics, Astronomy and Mathematics (PAM)
* Members of the School of Computer Science (CS)
* Others, by special arrangement; restricted to those who have made a financial contribution to the cluster.
Access is granted subject to observance of our usage [[policies]].
c53f75292ee04ffd37baaf6903af37b43f7f56ff
Software
0
17
107
81
2010-06-18T09:24:39Z
Akukol
3
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock Vina: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
3806f00214c69ae60ba0d54caa964f377c7a600a
113
107
2010-06-18T09:45:01Z
Akukol
3
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock [Vina]: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
ff0b9f37484eb9c6f4e853931b59a9a38a0d2313
114
113
2010-06-18T09:45:30Z
Akukol
3
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
2b74f156f1a0fe4d84db916950c3c7c27a2605ae
IGemDock
0
21
108
2010-06-18T09:28:12Z
Akukol
3
Created page with 'IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking. It is particular suited for virtual screening using many processors.'
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking. It is particular suited for virtual screening using many processors.
bcc4992fadfd990d5aa12aaebaecef93f0757073
109
108
2010-06-18T09:30:02Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking. It is particular suited for virtual screening using many processors.
Start with /soft/iGEMDOCKv2.1-centos/bin/iGemdock
The molecular docking engine is /soft/iGEMDOCKv2.1-centos/bin/mod_ga
1dfde735fc4dcc33c3ef023a825a5d9abc9b0d89
117
109
2010-06-18T10:16:50Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking. It is particular suited for virtual screening using many processors.
First you need to execute
'export LD_LIBRARY_PATH=/soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH'
(put it in your .bashrc, so it is set automatically when you log in)
Start with '/soft/iGEMDOCKv2.1-centos/bin/iGemdock'
The molecular docking engine is '/soft/iGEMDOCKv2.1-centos/bin/mod_ga'
0ac48ab6fa1213b3e35c117c927c7d4248742aea
118
117
2010-06-23T15:56:02Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking.
First you need to execute
'export LD_LIBRARY_PATH=/soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH'
(put it in your .bashrc, so it is set automatically when you log in)
Start with '/soft/iGEMDOCKv2.1-centos/bin/iGemdock'
The molecular docking engine is '/soft/iGEMDOCKv2.1-centos/bin/mod_ga'
Gemdock runs one process only (on one CPU core).
d22a875683384edea9bbb2130a0dac8a626e9490
119
118
2010-06-23T15:56:29Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking.
First you need to execute
'export LD_LIBRARY_PATH=/soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH'
(put it in your .cshrc, so it is set automatically when you log in)
Start with '/soft/iGEMDOCKv2.1-centos/bin/iGemdock'
The molecular docking engine is '/soft/iGEMDOCKv2.1-centos/bin/mod_ga'
Gemdock runs one process only (on one CPU core).
cd637363f6323e7432e85196427a719bc5fbf531
130
119
2010-07-02T13:50:26Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking.
First you need to execute
'export LD_LIBRARY_PATH=/soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH'
(put it in your .cshrc, so it is set automatically when you log in)
Start with '/soft/iGEMDOCKv2.1-centos/bin/iGemdock'
The molecular docking engine is '/soft/iGEMDOCKv2.1-centos/bin/mod_ga'
Gemdock runs one process only (on one CPU core).
<pre>#!/bin/sh
#PBS -N GemD_comt2
#PBS -q main
#PBS -l nodes=1:ppn=1
#PBS -j oe
#PBS -u akukol
#PBS -l walltime=250:00:00
export LD_LIBRARY_PATH /soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH
cd /home/akukol/data/vscreenTest/comt2_gemdock
### This is the command ###
/usr/local/bin/mpiexec /soft/iGEMDOCKv2.1-centos/bin/mod_ga -f docking.dock
### command end ###
# start with 'qsub RunGemdock.sh'
<\pre>
2ef81b03615686a4afd930fedd628eca6511f7f7
131
130
2010-07-02T13:50:38Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking.
First you need to execute
'export LD_LIBRARY_PATH=/soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH'
(put it in your .cshrc, so it is set automatically when you log in)
Start with '/soft/iGEMDOCKv2.1-centos/bin/iGemdock'
The molecular docking engine is '/soft/iGEMDOCKv2.1-centos/bin/mod_ga'
Gemdock runs one process only (on one CPU core).
<pre>#!/bin/sh
#PBS -N GemD_comt2
#PBS -q main
#PBS -l nodes=1:ppn=1
#PBS -j oe
#PBS -u akukol
#PBS -l walltime=250:00:00
export LD_LIBRARY_PATH /soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH
cd /home/akukol/data/vscreenTest/comt2_gemdock
### This is the command ###
/usr/local/bin/mpiexec /soft/iGEMDOCKv2.1-centos/bin/mod_ga -f docking.dock
### command end ###
# start with 'qsub RunGemdock.sh'
</pre>
a96a7276a8bcfeb7ded719a1f02144b5b898303e
132
131
2010-07-02T13:52:02Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking.
First you need to execute
'export LD_LIBRARY_PATH=/soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH'
(put it in your .cshrc, so it is set automatically when you log in)
Start with '/soft/iGEMDOCKv2.1-centos/bin/iGemdock'
The molecular docking engine is '/soft/iGEMDOCKv2.1-centos/bin/mod_ga'
Gemdock runs one process only (on one CPU core).
That is the script RunGemdock.sh you need (remember to make RanGemdock.sh executable):
<pre>#!/bin/sh
#PBS -N GemD_comt2
#PBS -q main
#PBS -l nodes=1:ppn=1
#PBS -j oe
#PBS -u akukol
#PBS -l walltime=250:00:00
export LD_LIBRARY_PATH /soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH
cd /home/akukol/data/vscreenTest/comt2_gemdock
### This is the command ###
/usr/local/bin/mpiexec /soft/iGEMDOCKv2.1-centos/bin/mod_ga -f docking.dock
### command end ###
# start with 'qsub RunGemdock.sh'
</pre>
b7a6f42451794469ae2cb23ba799c6201d6db426
133
132
2010-07-02T13:52:30Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking.
First you need to execute
'export LD_LIBRARY_PATH=/soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH'
(put it in your .cshrc, so it is set automatically when you log in)
Start with '/soft/iGEMDOCKv2.1-centos/bin/iGemdock'
The molecular docking engine is '/soft/iGEMDOCKv2.1-centos/bin/mod_ga'
Gemdock runs one process only (on one CPU core).
That is the script RunGemdock.sh you need (remember to make RunGemdock.sh executable):
<pre>#!/bin/sh
#PBS -N GemD_comt2
#PBS -q main
#PBS -l nodes=1:ppn=1
#PBS -j oe
#PBS -u akukol
#PBS -l walltime=250:00:00
export LD_LIBRARY_PATH /soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH
cd /home/akukol/data/vscreenTest/comt2_gemdock
### This is the command ###
/usr/local/bin/mpiexec /soft/iGEMDOCKv2.1-centos/bin/mod_ga -f docking.dock
### command end ###
# start with 'qsub RunGemdock.sh'
</pre>
f03765c2a432518d8fdd4f5cb4488470298f0ffb
Autodock
0
22
110
2010-06-18T09:41:08Z
Akukol
3
Created page with '[http://autodock.scripps.edu/ AutoDock] is software for molecular docking. It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads Auto…'
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. akukol/bin/MGLtools.
You need to download the file mgltools_x86_64Linux2_1.5.4.tar.gz. The automatic installer does not work.
82021f4b4b0d4c48f0fde9298677d72320299196
111
110
2010-06-18T09:41:44Z
Akukol
3
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. 'akukol/bin/MGLtools'.
You need to download the file 'mgltools_x86_64Linux2_1.5.4.tar.gz'. The automatic installer does not work.
03ad99376c68e062700dd0023d6964603e8d2de1
112
111
2010-06-18T09:44:13Z
Akukol
3
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. 'akukol/bin/MGLtools'.
You need to download the file 'mgltools_x86_64Linux2_1.5.4.tar.gz'. The automatic installer does not work.
For virtual screening obtain the software [http://autodock.scripps.edu/resources/raccoon Raccoon]. This needs AutoDock tools to be installed first.
1b7b7bf69703566df7feaea2f9f7f1427a87d3f3
120
112
2010-06-23T16:07:27Z
Akukol
3
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. 'akukol/bin/MGLtools'.
You need to download the file 'mgltools_x86_64Linux2_1.5.4.tar.gz'. The automatic installer does not work.
For virtual screening obtain the software [http://autodock.scripps.edu/resources/raccoon Raccoon]. This needs AutoDock tools to be installed first.
Raccoon automatically generates the file vs_submit.sh. This must be edited according to your special circumstances (see below) and then started with:
'nohup vs_submit.sh &' (do not use qsub)
<pre>#!/bin/bash
#
# Generated with Raccoon | AutoDockVS
#
#### PBS jobs parametersCPUT="00:20:00"
WALLT="00:20:00" # << change here
#
# There should be no reason
# for changing the following values
NODES=1
PPN=1
MEM=512mb
### CUSTOM VARIABLES
#
# use the following line to set special options (e.g. specific queues)
#OPT="-q MyPriorQueue"
OPT=""
# Paths for executables on the cluster
# Modify them to specify custom executables to be used
QSUB="qsub" # << change here
AUTODOCK="/soft/autodock/autodock4" # << change here
# Special path to move into before running
# the screening. This is very system-specific,
# so unless you're know what are you doing,
# leave it as it is
WORKING_PATH=`pwd`
##################################################################################
##################################################################################
####### There should be no need to modify anything below this line ###############################
##################################################################################
##################################################################################
#
#
type $AUTODOCK &> /dev/null || {
echo -e "\nError: the file [$AUTODOCK] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the AutoDock binary in the script";
echo -e "( i.e. AUTODOCK=/usr/bin/autodock4 )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
type $QSUB &> /dev/null || {
echo -e "\nError: the file [$QSUB] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the Qsub command binary in the script";
echo -e "( i.e. QSUB=/usr/bin/qsub )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
echo Starting submission...
for NAME in `cat jobs_list`
do
cd $NAME
echo "#!/bin/bash" > $NAME.job
echo "cd $WORKING_PATH/$NAME" >> $NAME.job
echo "$AUTODOCK -p $NAME.dpf -l $NAME.dlg" >> $NAME.job
chmod +x $NAME.job
echo -n "Submitting $NAME : "
$QSUB $OPT -l cput=$CPUT -l nodes=1:ppn=1 -l walltime=$WALLT -l mem=$MEM $NAME.job
sleep 23 # << add this line to avoid flooding the cluster with 1000nds of jobs
cd ..
done
<\pre>
The wait time of 23 seconds may be reduced in order to speed up the calculation.
08b1911982264aee959cac6b677dd4b28df6b602
121
120
2010-06-23T16:08:21Z
Akukol
3
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. 'akukol/bin/MGLtools'.
You need to download the file 'mgltools_x86_64Linux2_1.5.4.tar.gz'. The automatic installer does not work.
For virtual screening obtain the software [http://autodock.scripps.edu/resources/raccoon Raccoon]. This needs AutoDock tools to be installed first.
Raccoon automatically generates the file vs_submit.sh. This must be edited according to your special circumstances (see below) and then started with:
'nohup vs_submit.sh &' (do not use qsub)
<pre>#!/bin/bash
#
# Generated with Raccoon | AutoDockVS
#
#### PBS jobs parametersCPUT="00:20:00"
WALLT="00:20:00" # << change here
#
# There should be no reason
# for changing the following values
NODES=1
PPN=1
MEM=512mb
### CUSTOM VARIABLES
#
# use the following line to set special options (e.g. specific queues)
#OPT="-q MyPriorQueue"
OPT=""
# Paths for executables on the cluster
# Modify them to specify custom executables to be used
QSUB="qsub" # << change here
AUTODOCK="/soft/autodock/autodock4" # << change here
# Special path to move into before running
# the screening. This is very system-specific,
# so unless you're know what are you doing,
# leave it as it is
WORKING_PATH=`pwd`
##################################################################################
##################################################################################
####### There should be no need to modify anything below this line ###############################
##################################################################################
##################################################################################
#
#
type $AUTODOCK &> /dev/null || {
echo -e "\nError: the file [$AUTODOCK] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the AutoDock binary in the script";
echo -e "( i.e. AUTODOCK=/usr/bin/autodock4 )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
type $QSUB &> /dev/null || {
echo -e "\nError: the file [$QSUB] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the Qsub command binary in the script";
echo -e "( i.e. QSUB=/usr/bin/qsub )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
echo Starting submission...
for NAME in `cat jobs_list`
do
cd $NAME
echo "#!/bin/bash" > $NAME.job
echo "cd $WORKING_PATH/$NAME" >> $NAME.job
echo "$AUTODOCK -p $NAME.dpf -l $NAME.dlg" >> $NAME.job
chmod +x $NAME.job
echo -n "Submitting $NAME : "
$QSUB $OPT -l cput=$CPUT -l nodes=1:ppn=1 -l walltime=$WALLT -l mem=$MEM $NAME.job
sleep 23 # << add this line to avoid flooding the cluster with 1000nds of jobs
cd ..
done
</pre>
The wait time of 23 seconds may be reduced in order to speed up the calculation.
1ba75e8cea04be72dcac1738bbde9e429d826c19
148
121
2010-07-22T14:22:15Z
Akukol
3
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. 'akukol/bin/MGLtools'.
You need to download the file 'mgltools_x86_64Linux2_1.5.4.tar.gz'. The automatic installer does not work.
For virtual screening obtain the software [http://autodock.scripps.edu/resources/raccoon Raccoon]. This needs AutoDock tools to be installed first.
Raccoon automatically generates the file vs_submit.sh. This must be edited according to your special circumstances (see below) and then started with:
'nohup vs_submit.sh &' (do not use qsub)
<pre>#!/bin/bash
#
# Generated with Raccoon | AutoDockVS
#
#### PBS jobs parametersCPUT="00:20:00"
WALLT="00:20:00" # << change here
#
# There should be no reason
# for changing the following values
NODES=1
PPN=1
MEM=512mb
### CUSTOM VARIABLES
#
# use the following line to set special options (e.g. specific queues)
#OPT="-q MyPriorQueue"
OPT="-j oe" # join output and error
# Paths for executables on the cluster
# Modify them to specify custom executables to be used
QSUB="qsub" # << change here
AUTODOCK="/soft/autodock/autodock4" # << change here
# Special path to move into before running
# the screening. This is very system-specific,
# so unless you're know what are you doing,
# leave it as it is
WORKING_PATH=`pwd`
##################################################################################
##################################################################################
####### There should be no need to modify anything below this line ###############################
##################################################################################
##################################################################################
#
#
type $AUTODOCK &> /dev/null || {
echo -e "\nError: the file [$AUTODOCK] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the AutoDock binary in the script";
echo -e "( i.e. AUTODOCK=/usr/bin/autodock4 )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
type $QSUB &> /dev/null || {
echo -e "\nError: the file [$QSUB] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the Qsub command binary in the script";
echo -e "( i.e. QSUB=/usr/bin/qsub )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
echo Starting submission...
for NAME in `cat jobs_list`
do
cd $NAME
echo "#!/bin/bash" > $NAME.job
echo "cd $WORKING_PATH/$NAME" >> $NAME.job
echo "$AUTODOCK -p $NAME.dpf -l $NAME.dlg" >> $NAME.job
chmod +x $NAME.job
echo -n "Submitting $NAME : "
$QSUB $OPT -l cput=$CPUT -l nodes=1:ppn=1 -l walltime=$WALLT -l mem=$MEM $NAME.job
sleep 23 # << add this line to avoid flooding the cluster with 1000nds of jobs
cd ..
done
</pre>
The wait time of 23 seconds may be reduced in order to speed up the calculation.
c369ebb49eea40e519cdf45145616e90d060bd3b
149
148
2010-07-22T14:25:28Z
Akukol
3
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. 'akukol/bin/MGLtools'.
You need to download the file 'mgltools_x86_64Linux2_1.5.4.tar.gz'. The automatic installer does not work.
For virtual screening obtain the software [http://autodock.scripps.edu/resources/raccoon Raccoon]. This needs AutoDock tools to be installed first.
Raccoon automatically generates the file vs_submit.sh. This must be edited according to your special circumstances (see below) and then started with:
'nohup vs_submit.sh &' (do not use qsub)
<pre>#!/bin/bash
#
# Generated with Raccoon | AutoDockVS
#
#### PBS jobs parametersCPUT="00:20:00"
WALLT="00:20:00" # << change here
#
# There should be no reason
# for changing the following values
NODES=1
PPN=1
MEM=512mb
### CUSTOM VARIABLES
#
# use the following line to set special options (e.g. specific queues)
#OPT="-q MyPriorQueue"
OPT="-j oe -N AutoDock" # join output and error, job name: Autodock
# Paths for executables on the cluster
# Modify them to specify custom executables to be used
QSUB="qsub" # << change here
AUTODOCK="/soft/autodock/autodock4" # << change here
# Special path to move into before running
# the screening. This is very system-specific,
# so unless you're know what are you doing,
# leave it as it is
WORKING_PATH=`pwd`
##################################################################################
##################################################################################
####### There should be no need to modify anything below this line ###############################
##################################################################################
##################################################################################
#
#
type $AUTODOCK &> /dev/null || {
echo -e "\nError: the file [$AUTODOCK] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the AutoDock binary in the script";
echo -e "( i.e. AUTODOCK=/usr/bin/autodock4 )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
type $QSUB &> /dev/null || {
echo -e "\nError: the file [$QSUB] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the Qsub command binary in the script";
echo -e "( i.e. QSUB=/usr/bin/qsub )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
echo Starting submission...
for NAME in `cat jobs_list`
do
cd $NAME
echo "#!/bin/bash" > $NAME.job
echo "cd $WORKING_PATH/$NAME" >> $NAME.job
echo "$AUTODOCK -p $NAME.dpf -l $NAME.dlg" >> $NAME.job
chmod +x $NAME.job
echo -n "Submitting $NAME : "
$QSUB $OPT -l cput=$CPUT -l nodes=1:ppn=1 -l walltime=$WALLT -l mem=$MEM $NAME.job
sleep 23 # << add this line to avoid flooding the cluster with 1000nds of jobs
cd ..
done
</pre>
The wait time of 23 seconds may be reduced in order to speed up the calculation.
2f099ce00e9c2a490ae11f1d92fd6c9a6e1cf18d
Vina
0
23
115
2010-06-18T09:48:33Z
Akukol
3
Created page with '[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking. It is very easy to use for virtual screening with shell scripts as explained in the manual. The mole…'
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[AutoDock]]).
c7e261f37833a5296c8d87826dc7902021eb7cff
116
115
2010-06-18T09:49:18Z
Akukol
3
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
720ff398c5065c24800b8e6f00cbbd3069206cf4
122
116
2010-06-23T16:13:54Z
Akukol
3
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinScreen.bash &' (not qsub).
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N VinaScreen -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
406c0c5f3294ec251305b34ae9b1059b1ae57c88
123
122
2010-06-23T16:14:47Z
Akukol
3
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file (conf.txt in the example below) into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinScreen.bash &' (not qsub).
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N VinaScreen -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
c63b833f47732a36c4362ab783e9327f5bf50f85
124
123
2010-06-23T16:15:17Z
Akukol
3
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file (conf.txt in the example below) into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinaScreen.bash &' (not qsub).
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N VinaScreen -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
0df39ba0b810d567af94baad014afdb936372bd6
128
124
2010-07-02T13:45:57Z
Akukol
3
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file (conf.txt in the example below) into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinaScreen.bash > nohup.out &' (not qsub).
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N VinaScreen -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
2bdb7e4feeefb941a39372e35433dc482e9ed625
129
128
2010-07-02T13:46:51Z
Akukol
3
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file (conf.txt in the example below) into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinaScreen.bash > nohup.out &' (not qsub).
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N $f -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
221257bc4c7f7c214da6d54c5ff7563608c0d202
Architecture
0
7
125
62
2010-06-24T11:06:28Z
Akukol
3
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 16-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 8-core Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 8-core Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
45bacc25467e6c1513b75a50f1b1636eb46bd1fa
135
125
2010-07-13T06:26:21Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 16-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 8-core Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 8-core Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores, 256 Mb RAM and QDR Infiniband (smp1 and smp2).
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
0772ed16211bc52a9d115f6a078bcf175bb803e7
145
135
2010-07-21T12:33:04Z
Cjoslin
5
Not 8 cores
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 16-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 1 socket x 4-core x 2 Hyperthreads Xeons with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 1 socket x 4-core x 2 Hyperthreads Xeons with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores, 256 Mb RAM and QDR Infiniband (smp1 and smp2).
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
57b1ca77678cd32d56c5ea411fedf85f5e52adb5
146
145
2010-07-21T12:35:14Z
Cjoslin
5
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 16-core Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 Xeons 1 socket x 4-core x 2 Hyperthreads with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons 1 socket x 4-core x 2 Hyperthreads with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores, 256 Mb RAM and QDR Infiniband (smp1 and smp2).
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
036b1e5ab02629324e6f70b593c440d53b861250
147
146
2010-07-21T12:36:43Z
Cjoslin
5
not 16 core head node
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 Xeons 1 socket x 4-core x 2 Hyperthreads with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons 1 socket x 4-core x 2 Hyperthreads with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores, 256 Mb RAM and QDR Infiniband (smp1 and smp2).
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
dbced85d9ea4b071e101c870ff2e8e0344875940
150
147
2010-07-22T14:29:53Z
Cjoslin
5
Corrected nodes info
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 Xeons(E5520s) 2 socket x 4-core Hyperthreading off with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons(E5520s) 2 socket x 4-core Hyperthreading off with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores, 256 Mb RAM and QDR Infiniband (smp1 and smp2).
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
69813107400e0db9da7d8d21505d76f8e6ab549b
MPI
0
12
126
60
2010-06-24T21:55:02Z
Mjh
2
/* MVAPICH2 */ mpiexec works now
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command mpi-selector-menu on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
<tt>mpiexec</tt> also works for starting MVAPICH2 jobs:
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
6d5d5ce91b07c6963968acdcee899c0378cba10b
127
126
2010-06-28T16:18:40Z
Mjh
2
/* MVAPICH2 */
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command <tt>mpi-selector-menu</tt> on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
<tt>mpiexec</tt> also works for starting MVAPICH2 jobs:
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
2282d70549048621a9c70a8b911e4af96e8f711c
134
127
2010-07-07T13:32:40Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Infiniband network, but will not use it natively: thus the latency and bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
For documentation of <tt>mpiexec</tt>, see [http://www.osc.edu/~djohnson/mpiexec/index.php]. By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=16:ppn=8 set, as in the example above, there will be 16*8=128 MPI processes (8 per node). If for some reason you want to alter this default behaviour (e.g. if your MPI code needs to run multi-threaded, as would be the case if you were mixing MPI and OpenMP), then <tt>mpiexec</tt> has options to allow this: run <tt>/usr/local/bin/mpiexec</tt> on the head node with no arguments to get a list of possible options.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command <tt>mpi-selector-menu</tt> on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
<tt>mpiexec</tt> also works for starting MVAPICH2 jobs:
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
b367fe1e9f230b735a8a4ebd2c3778d8eec2572c
SMP machines
0
24
136
2010-07-13T07:01:22Z
Mjh
2
Created page with 'The SMP machines are two 4-processor, 48-core systems each with 256 Mb of RAM. The individual cores in these machines are slightly slower (~ 20% on CPU-bound floating-point oper…'
wikitext
text/x-wiki
The SMP machines are two 4-processor, 48-core systems each with 256 Mb of RAM.
The individual cores in these machines are slightly slower (~ 20% on CPU-bound floating-point operations) than the Intel CPUs of the main computing nodes. So users are recommended to use the main part of the cluster for straightforward computing requirements that do not need the special features of the SMP machines.
The big advantage of the SMP machines is the large amount of physical memory visible to all 48 cores. This allows for multi-threaded, shared-memory applications.
The SMP machines also each have 10 Tb of local scratch disc space (mounted as /scratch).
Jobs may be started on the SMP machines using the <tt>smp</tt> [[queues|queue]] in Torque. We will also permit direct login in certain circumstances. Please discuss your requirements with us.
== Restrictions ==
The SMP machines are 'at risk' until further notice while we continue to configure them.
smp2 does not have its scratch disc set up yet due to a faulty hard drive.
The SMP machines are running FC13 and Linux 2.6.35-rc4 (!). This is slightly different from other nodes of the cluster; please be alert to problems this may cause.
Infiniband-aware MPI code will not run on the SMP machines, as the libraries are not yet installed (the underlying OS is too new for the OFED packages to compile). Since it is probably not sensible to run jobs spanning the main cluster and the smp machines (and since this is not currently possible via Torque in any case) this should not be a serious restriction.
040ef8a2a074eb69c03a14c39dca079bd45cd481
142
136
2010-07-13T07:28:21Z
Mjh
2
wikitext
text/x-wiki
The SMP machines are two 4-processor, 48-core systems each with 256 Gb of RAM.
The individual cores in these machines are slightly slower (~ 20% on CPU-bound floating-point operations) than the Intel CPUs of the main computing nodes. So users are recommended to use the main part of the cluster for straightforward computing requirements that do not need the special features of the SMP machines.
The big advantage of the SMP machines is the large amount of physical memory visible to all 48 cores. This allows for multi-threaded, shared-memory applications.
The SMP machines also each have 10 Tb of local scratch disc space (mounted as /scratch).
Jobs may be started on the SMP machines using the <tt>smp</tt> [[queues|queue]] in Torque. We will also permit direct login in certain circumstances. Please discuss your requirements with us.
== Restrictions ==
The SMP machines are 'at risk' until further notice while we continue to configure them.
smp2 does not have its scratch disc set up yet due to a faulty hard drive.
The SMP machines are running FC13 and Linux 2.6.35-rc4 (!). This is slightly different from other nodes of the cluster; please be alert to problems this may cause.
Infiniband-aware MPI code will not run on the SMP machines, as the libraries are not yet installed (the underlying OS is too new for the OFED packages to compile). Since it is probably not sensible to run jobs spanning the main cluster and the smp machines (and since this is not currently possible via Torque in any case) this should not be a serious restriction.
49ecb818c970b99aceab7d32aef738517b8908c1
Main Page
0
1
137
69
2010-07-13T07:01:49Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
ad43eb102dc4e0355d37645478e34d0404843c02
Queues
0
15
138
87
2010-07-13T07:03:11Z
Mjh
2
wikitext
text/x-wiki
There are five possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* Finally 'all' submits to all 80 nodes, with a maximum wall time of 2 hours; note that MPI over Infiniband will not work if you request nodes spanning both the main and CAIR clusters. This queue should only be used in unusual circumstances -- normally it will be better to use one of the others.
* 'smp' submits to the two [[SMP machines]].
== Default wall times ==
The default wall time limitation for the 'all' and 'cair_s' queues is also the maximum, i.e. 6 hours and 2 hours respectively.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
014e0f8534164bf1d059b30e2f97f342c9add5ce
139
138
2010-07-13T07:04:15Z
Mjh
2
wikitext
text/x-wiki
There are five possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* Finally 'all' submits to all 80 nodes, with a maximum wall time of 2 hours; note that MPI over Infiniband will not work if you request nodes spanning both the main and CAIR clusters. This queue should only be used in unusual circumstances -- normally it will be better to use one of the others.
* 'smp' is a special queue that submits to the two [[SMP machines]]. It is not currently possible to run jobs that span the SMP machines and the main or CAIR clusters.
== Default wall times ==
The default wall time limitation for the 'all' and 'cair_s' queues is also the maximum, i.e. 6 hours and 2 hours respectively.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
25f046a1287eaa9f6ce161fa60e7bebcad0cb5bf
Networking
0
10
140
26
2010-07-13T07:06:55Z
Mjh
2
wikitext
text/x-wiki
The nodes are linked by Gigabit ethernet and Infiniband networks.
The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet switch to which all nodes in that chassis are connected. These switches, together with the head node, are connected together via a main ethernet switch. Management traffic also uses this switch.
There are in fact two infiniband networks: one for the main cluster, which is dual data-rate (nominal 2.5 Gbit/s) and one for the CAIR cluster, which is quad data-rate (5.0 Gbit/s). As for the ethernet, each chassis has an internal Infiniband switch. The two sub-clusters each have an external Infiniband switch which connects the chassis that make up the cluster and the head node is connected, separately, to both of those switches. Therefore there is no native infiniband connectivity between the two sub-clusters. IP over Infiniband can be used to connect between them, but this should not be relied on for large data volumes as it must be routed via the head node.
The networking arrangements should be taken into consideration when running jobs where fast communication is important -- latency and data transfer rates are somewhat higher between nodes in the same chassis than between different chassis in the same cluster, and ethernet connections between the two sub-clusters are higher-latency and lower-bandwidth still, while native Infiniband connections between the two sub-clusters are essentially impossible. Best results will be obtained running jobs within a single chassis.
None of the nodes are on the public Internet: they all have private IP addresses starting 192.168.... These addresses are not routable from anywhere else in the University. They can, however, all see the public Internet via IP masquerading through the head node through the ethernet network. (For this reason, it is not sensible to try to make high-volume accesses to the Internet from the nodes.)
Each node has an IP address corresponding to its ethernet port (192.168.2.xxx where xxx is the node number) and one corresponding to the Infiniband port (192.168.3.xxx or 4.xxx, where the 3 and 4 refer to the main and CAIR clusters respectively). Within the cluster, you may use nodexxx.data (e.g. node001.data) and nodexxx.infi (node001.infi) to refer to these two networks. High-volume traffic should use the Infiniband network.
The two SMP machines are attached to the Infiniband network of the main cluster and are called smp1 and smp2: thus they have addresses smp1.data, smp1.infi etc.
064d9d69c0e57bf732c7bf7740f11a683c722412
143
140
2010-07-20T12:42:53Z
Mjh
2
wikitext
text/x-wiki
The nodes are linked by Gigabit ethernet and Infiniband networks.
The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet switch to which all nodes in that chassis are connected. These switches, together with the head node, are connected together via a main ethernet switch. Management traffic also uses this switch. The links connecting the head node and the SMP machines, and the uplinks from the two node internal ethernet stacks, are carried by multiple physical cables via link aggregation.
There are in fact two infiniband networks: one for the main cluster, which is dual data-rate (nominal 2.5 Gbit/s) and one for the CAIR cluster, which is quad data-rate (5.0 Gbit/s). As for the ethernet, each chassis has an internal Infiniband switch. The two sub-clusters each have an external Infiniband switch which connects the chassis that make up the cluster and the head node is connected, separately, to both of those switches. Therefore there is no native infiniband connectivity between the two sub-clusters. IP over Infiniband can be used to connect between them, but this should not be relied on for large data volumes as it must be routed via the head node.
The networking arrangements should be taken into consideration when running jobs where fast communication is important -- latency and data transfer rates are somewhat higher between nodes in the same chassis than between different chassis in the same cluster, and ethernet connections between the two sub-clusters are higher-latency and lower-bandwidth still, while native Infiniband connections between the two sub-clusters are essentially impossible. Best results will be obtained running jobs within a single chassis.
None of the nodes are on the public Internet: they all have private IP addresses starting 192.168.... These addresses are not routable from anywhere else in the University. They can, however, all see the public Internet via IP masquerading through the head node through the ethernet network. (For this reason, it is not sensible to try to make high-volume accesses to the Internet from the nodes.)
Each node has an IP address corresponding to its ethernet port (192.168.2.xxx where xxx is the node number) and one corresponding to the Infiniband port (192.168.3.xxx or 4.xxx, where the 3 and 4 refer to the main and CAIR clusters respectively). Within the cluster, you may use nodexxx.data (e.g. node001.data) and nodexxx.infi (node001.infi) to refer to these two networks. High-volume traffic should use the Infiniband network.
The two SMP machines are attached to the Infiniband network of the main cluster and are called smp1 and smp2: thus they have addresses smp1.data, smp1.infi etc.
b525a5ffe8b47c0504f134287432979336be223c
Storage
0
8
141
17
2010-07-13T07:08:10Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 65 Tb of scratch available to all users, mounted as /stri-data
* 40 Tb of scratch for CAIR users only, mounted as /cair-data
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large astronomical datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
418f5eafc7007c1a62b5b5061d168d982b2cab1e
Jobs
0
9
144
86
2010-07-20T13:25:44Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001.infi:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
d6f7d92175cfeb22d1d44d49a9e037f80f28cda5
Architecture
0
7
151
150
2010-08-25T12:10:07Z
Cjoslin
5
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 Xeons(E5520s) 2 socket x 4-core Hyperthreading off with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons(E5520s) 2 socket x 4-core Hyperthreading off with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores, 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
7d3ebab032590efa2e6fbab1948e38a4fa7fbb93
164
151
2010-12-06T12:17:05Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
24f495ddfb026d10cab4c30f89e68eb90a51d22a
198
164
2011-03-23T15:55:17Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 4 sandbox machines (each with 2 x Xeon E5345 4-core CPUs, 8 Gb RAM, no infiniband)
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
fe532d6bd3f2c0bb5dd9ffb2c02fe872f8fb35c9
MPI
0
12
152
134
2010-09-08T12:22:36Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Ethernet network, and so the latency and total available bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
For documentation of <tt>mpiexec</tt>, see [http://www.osc.edu/~djohnson/mpiexec/index.php]. By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=16:ppn=8 set, as in the example above, there will be 16*8=128 MPI processes (8 per node). If for some reason you want to alter this default behaviour (e.g. if your MPI code needs to run multi-threaded, as would be the case if you were mixing MPI and OpenMP), then <tt>mpiexec</tt> has options to allow this: run <tt>/usr/local/bin/mpiexec</tt> on the head node with no arguments to get a list of possible options.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command <tt>mpi-selector-menu</tt> on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
<tt>mpiexec</tt> also works for starting MVAPICH2 jobs:
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
58be49ca4df8c782bd850c92a5212fc9465ae7e9
156
152
2010-10-15T07:22:29Z
Mjh
2
/* MVAPICH2 */
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Ethernet network, and so the latency and total available bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
For documentation of <tt>mpiexec</tt>, see [http://www.osc.edu/~djohnson/mpiexec/index.php]. By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=16:ppn=8 set, as in the example above, there will be 16*8=128 MPI processes (8 per node). If for some reason you want to alter this default behaviour (e.g. if your MPI code needs to run multi-threaded, as would be the case if you were mixing MPI and OpenMP), then <tt>mpiexec</tt> has options to allow this: run <tt>/usr/local/bin/mpiexec</tt> on the head node with no arguments to get a list of possible options.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVIPICH2 run the command <tt>mpi-selector-menu</tt> on the head node and choose the appropriate option. When you next log in, your path and libraries will be set up correctly:
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
<tt>mpiexec</tt> also works for starting MVAPICH2 jobs (but see [[Known problems]]).
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected using <tt>mpi-selector-menu</tt>. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
100deabe4e89b05ddf9f7a56da9c8e46580c554c
194
156
2011-03-21T21:34:51Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Ethernet network, and so the latency and total available bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
For documentation of <tt>mpiexec</tt>, see [http://www.osc.edu/~djohnson/mpiexec/index.php]. By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=16:ppn=8 set, as in the example above, there will be 16*8=128 MPI processes (8 per node). If for some reason you want to alter this default behaviour (e.g. if your MPI code needs to run multi-threaded, as would be the case if you were mixing MPI and OpenMP), then <tt>mpiexec</tt> has options to allow this: run <tt>/usr/local/bin/mpiexec</tt> on the head node with no arguments to get a list of possible options.
=== MPICH2 (local versions) ===
Locally compiled, more up-to-date versions of MPICH2 are available. To use these use <tt>modules</tt> commands: do
<pre>
module unload mpich2-x86_64
module load mpich2-local
OR
module load mpich2-intel
</pre>
Then
<pre>
which mpicc
/soft/mpich2/bin/mpicc
</pre>
If you wish to use these permanently, then you are recommended to put these module commands in your .cshrc or .bashrc.
Currently you should run MPI jobs of this sort using the built-in <tt>mpiexec</tt> command, which is Torque-aware. For example,
<pre>
#!/bin/sh -f
#PBS -N sandbox-mpi
#PBS -m abe
#PBS -l nodes=4:ppn=8
#PBS -k oe
#PBS -q sandbox
#PBS -l walltime=00:02:00
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: array ID is $PBS_ARRAYID
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/soft/mpich2/bin/mpiexec -rmk pbs /home/mjh/c/mpi/examples/basic/cpi
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVAPICH2 do
<pre>
module unload mpich2-x86_64
module load mvapich2
</pre>
Then you should see
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
<tt>mpiexec</tt> also works for starting MVAPICH2 jobs (but see [[Known problems]]).
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected via the [[modules]] system. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
080752f9469dca13ee5b32c3a351f62f48bcc15f
Storage
0
8
153
141
2010-09-16T17:10:52Z
Mjh
2
policy, backups
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 65 Tb of scratch available to all users, mounted as /stri-data
* 40 Tb of scratch for CAIR users only, mounted as /cair-data
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work in /home, use a symbolic link).
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large astronomical datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
No data area on the cluster is currently backed up. You must take responsibility for your own backups. In the longer term we expect to be able to make regular backups of /home and possibly some of /stri-data .
ed090bf31b2487c7b4a2538824765003e79dd8f0
Main Page
0
1
154
137
2010-10-15T07:03:57Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
b08b82541e7f4bbb44b8a3c3b0f66f823dcbaaa2
168
154
2010-12-06T12:54:15Z
Mjh
2
/* Using the cluster */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Tesla]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
3f50b7b77461de898e2f5e4c2388b690684cbd70
178
168
2011-02-18T09:02:25Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Tesla]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
04b66f658064190574f1d7c5abcede3ba188cec5
185
178
2011-03-10T15:11:57Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
ca65dd0e1cafd49d3fbf7b6b6e553f53d0d77558
188
185
2011-03-16T20:27:31Z
Mjh
2
/* Using the cluster */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
2e03eddd92d8261297dea022398a632339d142a7
191
188
2011-03-21T20:57:11Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
df88f7325255b17bd79b96216e058530626f62a4
Known problems
0
25
155
2010-10-15T07:21:28Z
Mjh
2
Created page with '== Known problems == * Nodes 001-008 of the main cluster are powered off, and have been for some months. This is because the air conditioning capacity in the server room is not …'
wikitext
text/x-wiki
== Known problems ==
* Nodes 001-008 of the main cluster are powered off, and have been for some months. This is because the air conditioning capacity in the server room is not adequate. We are working with estates to solve this problem.
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA (although they should be) and so their latency and bandwidth is lower than it should be. This seems to be a problem with the Linux kernels on the nodes and will only be solved by an upgrade.
* The Fedora distribution on the nodes (FC12) is rather old in general and should probably be upgraded.
* There is a problem on certain nodes whereby mpiexec fails to start MVAPICH2 jobs: errors of the following form are seen:
<pre>
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(311).......: Initialization failed
MPID_Init(191)..............: channel initialization failed
MPIDI_CH3_Init(163).........:
MPIDI_CH3I_RDMA_init(184)...:
rdma_setup_startup_ring(373): cannot create cq
</pre>
At present we don't understand this problem or why it only happens on certain nodes. A workround is to use the old ssh-based method of starting jobs, as implemented in the script <tt>/soft/bin/torque-mv</tt> . This script is less flexible and less well integrated with Torque than mpiexec, but it does work. Note that mpiexec still works, and is still the recommended method, for jobs using MPICH2.
9827dc9142b830a47acb37c33cc1b4fa2f6d914a
163
155
2010-12-06T12:14:31Z
Mjh
2
/* Known problems */
wikitext
text/x-wiki
== Known problems ==
* Nodes 001-008 of the main cluster are powered off, and have been for some months. This is because the air conditioning capacity in the server room is not adequate. We are working with estates to solve this problem. As of Dec 2010 Estates have accepted that they are responsible for the problem and are planning to rectify it by installing additional ACUs.
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA (although they should be) and so their latency and bandwidth is lower than it should be. This seems to be a problem with the Linux kernels on the nodes and will only be solved by an upgrade.
* The Fedora distribution on the nodes (FC12) is rather old in general and should probably be upgraded.
* There is a problem on certain nodes whereby mpiexec fails to start MVAPICH2 jobs: errors of the following form are seen:
<pre>
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(311).......: Initialization failed
MPID_Init(191)..............: channel initialization failed
MPIDI_CH3_Init(163).........:
MPIDI_CH3I_RDMA_init(184)...:
rdma_setup_startup_ring(373): cannot create cq
</pre>
At present we don't understand this problem or why it only happens on certain nodes. A workround is to use the old ssh-based method of starting jobs, as implemented in the script <tt>/soft/bin/torque-mv</tt> . This script is less flexible and less well integrated with Torque than mpiexec, but it does work. Note that mpiexec still works, and is still the recommended method, for jobs using MPICH2.
3687ea24477f9e96c3bf35a398dbf9dda5162861
170
163
2011-02-03T11:53:49Z
Mjh
2
/* Known problems */
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA (although they should be) and so their latency and bandwidth is lower than it should be. This seems to be a problem with the Linux kernels on the nodes and will only be solved by an upgrade.
* The Fedora distribution on the nodes (FC12) is rather old in general and should probably be upgraded.
* There is a problem on certain nodes whereby mpiexec fails to start MVAPICH2 jobs: errors of the following form are seen:
<pre>
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(311).......: Initialization failed
MPID_Init(191)..............: channel initialization failed
MPIDI_CH3_Init(163).........:
MPIDI_CH3I_RDMA_init(184)...:
rdma_setup_startup_ring(373): cannot create cq
</pre>
At present we don't understand this problem or why it only happens on certain nodes. A workround is to use the old ssh-based method of starting jobs, as implemented in the script <tt>/soft/bin/torque-mv</tt> . This script is less flexible and less well integrated with Torque than mpiexec, but it does work. Note that mpiexec still works, and is still the recommended method, for jobs using MPICH2.
77c4c2873799105dbc2e7dc32c576e1d2d374bb7
171
170
2011-02-03T11:55:20Z
Mjh
2
/* Known problems */
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA (although they should be) and so their latency and bandwidth is lower than it should be. This seems to be a problem with the Linux kernels on the nodes and will only be solved by an upgrade.
* The Fedora distribution on the nodes (FC12) is rather old in general and should probably be upgraded.
* There is a problem on certain nodes whereby mpiexec fails to start MVAPICH2 jobs: errors of the following form are seen:
<pre>
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(311).......: Initialization failed
MPID_Init(191)..............: channel initialization failed
MPIDI_CH3_Init(163).........:
MPIDI_CH3I_RDMA_init(184)...:
rdma_setup_startup_ring(373): cannot create cq
</pre>
At present we don't understand this problem or why it only happens on certain nodes. A workround is to use the old ssh-based method of starting jobs, as implemented in the script <tt>/soft/bin/torque-mv</tt> . This script is less flexible and less well integrated with Torque than mpiexec, but it does work. Note that mpiexec still works, and is still the recommended method, for jobs using MPICH2.
* I/O intensive operations on the data or home directories can cause nodes to lock up in some situations. This only happens occasionally, but is clearly associated with NFS activity. We hope that an upgrade of the kernels (see above) will solve this problem too.
4b1357d151da9c4b0161395616ff34f29cf3fd90
199
171
2011-03-24T04:00:19Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA (although they should be) and so their latency and bandwidth is lower than it should be. This seems to be a problem with the Linux kernels on the nodes and will only be solved by an upgrade.
* The Fedora distribution on the nodes (FC12) is rather old in general and should probably be upgraded.
* There is a problem on certain nodes whereby mpiexec fails to start MVAPICH2 jobs: errors of the following form are seen:
<pre>
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(311).......: Initialization failed
MPID_Init(191)..............: channel initialization failed
MPIDI_CH3_Init(163).........:
MPIDI_CH3I_RDMA_init(184)...:
rdma_setup_startup_ring(373): cannot create cq
</pre>
At present we don't understand this problem or why it only happens on certain nodes. A workround is to use the old ssh-based method of starting jobs, as implemented in the script <tt>/soft/bin/torque-mv</tt> . This script is less flexible and less well integrated with Torque than mpiexec, but it does work. Note that mpiexec still works, and is still the recommended method, for jobs using MPICH2.
* I/O intensive operations on the data or home directories can cause nodes to lock up in some situations. This only happens occasionally, but is clearly associated with NFS activity. We hope that an upgrade of the kernels (see above) will solve this problem too.
See also a list of [[actions for upgrade]].
590859734f2843f81d95bddd8723f54f3e90c3ef
211
199
2011-04-22T21:21:48Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA (although they should be) and so their latency and bandwidth is lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* I/O intensive operations on the data or home directories can cause nodes to lock up in some situations. This only happens occasionally, but is clearly associated with NFS activity.
See also a list of [[actions for upgrade]].
e0b11cde708a7c1ae72c6788bbfcc5db31de7a40
Policies
0
4
157
47
2010-10-15T07:28:09Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing.
* The normal method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly unless you have been given specific authorization to do so (e.g. if you have got approval to dedicate one or more nodes to data reduction tasks).
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* We recommend that you write code with an awareness of the physical memory available in each machine (see [[Architecture]]) but, more importantly, if you are likely to exceed the physical memory limits, and may push the Linux kernel to start killing random processes, please make sure that you do not do this on any machine that may be shared with others -- i.e., make sure that you have exclusive use of any node on which you are going to take such risks.
211f1cabb85994a6458a9b3201763bbd4f5b38a2
162
157
2010-11-23T15:58:27Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing.
* The normal method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly unless you have been given specific authorization to do so (e.g. if you have got approval to dedicate one or more nodes to data reduction tasks).
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* We recommend that you write code with an awareness of the physical memory available in each machine (see [[Architecture]]) but, more importantly, if you are likely to exceed the physical memory limits, and may push the Linux kernel to start killing random processes, please make sure that you do not do this on any machine that may be shared with others -- i.e., make sure that you have exclusive use of any node on which you are going to take such risks.
* There is a fair-share policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that if you have been making heavy use of the cluster on a timescale of days (to be exact, a weighted integral over the past week is taken) future jobs will be given a lower priority; you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on this timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
4a9d2c74d656541e5cbf7677986292d123f6ba0a
207
162
2011-04-22T20:39:30Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing.
* The normal method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly unless you have been given specific authorization to do so (e.g. if you have got approval to dedicate one or more nodes to data reduction tasks). Even then, it would be best to use the [[interactive jobs]] facility.
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* We recommend that you write code with an awareness of the physical memory available in each machine (see [[Architecture]]) but, more importantly, if you are likely to exceed the physical memory limits, and may push the Linux kernel to start killing random processes, please make sure that you do not do this on any machine that may be shared with others -- i.e., make sure that you have exclusive use of any node on which you are going to take such risks.
* There is a fair-share policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that if you have been making heavy use of the cluster on a timescale of days (to be exact, a weighted integral over the past week is taken) future jobs will be given a lower priority; you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on this timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
f9d9447d14aa2e13ea8216b0e1ac4947c0818785
Queues
0
15
158
139
2010-11-04T14:42:46Z
Mjh
2
wikitext
text/x-wiki
There are five possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' is a special queue that submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters.
* Finally 'all' submits to all 80 nodes and the two SMP machines, with a maximum wall time of 2 hours; note that MPI over Infiniband will not work if you request nodes spanning both the main and CAIR clusters; MPI over Infiniband does not work at all on the [[SMP machines]]. This queue should only be used in unusual circumstances -- normally it will be better to use one of the others.
== Default wall times ==
The default wall time limitation for the 'all' and 'cair_s' queues is also the maximum, i.e. 2 hours and 6 hours respectively.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
6699f8020c3e685d3077debc35dba6e760610668
160
158
2010-11-17T13:03:23Z
Mjh
2
wikitext
text/x-wiki
There are four possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' is a special queue that submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters.
== Default wall times ==
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
aa400a9a90b391262e6c74d1c882301fdcd54262
161
160
2010-11-17T13:08:03Z
Mjh
2
wikitext
text/x-wiki
There are four possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. There are no other limitations on this queue.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours. Users may have at most 2 jobs in this queue at a time.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' is a special queue that submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters.
== Default wall times ==
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
091a3fcb9383b60fffac6460ff84a8444f7abe4d
187
161
2011-03-10T15:19:22Z
Mjh
2
wikitext
text/x-wiki
There are five possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours. Users may have at most 2 jobs in this queue at a time.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' is a special queue that submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
06abe04575ba7e6978c923a289acbac70bbc71cd
SMP machines
0
24
159
142
2010-11-04T14:44:11Z
Mjh
2
wikitext
text/x-wiki
The SMP machines are two 4-processor, 48-core systems each with 256 Gb of RAM.
The individual cores in these machines are slightly slower (~ 20% on CPU-bound floating-point operations) than the Intel CPUs of the main computing nodes. So users are recommended to use the main part of the cluster for straightforward computing requirements that do not need the special features of the SMP machines.
The big advantage of the SMP machines is the large amount of physical memory visible to all 48 cores. This allows for multi-threaded, shared-memory applications.
The SMP machines also each have 10 Tb of local scratch disc space (mounted as /scratch).
Jobs may be started on the SMP machines using the <tt>smp</tt> [[queues|queue]] in Torque. We will also permit direct login in certain circumstances. Please discuss your requirements with us.
== Restrictions ==
The SMP machines are running FC13 and Linux 2.6.35-rc4 (!). This is slightly different from other nodes of the cluster; please be alert to problems this may cause.
Infiniband-aware MPI code will not run on the SMP machines, as the libraries are not yet installed (the underlying OS is too new for the OFED packages to compile). Since it is probably not sensible to run jobs spanning the main cluster and the smp machines because of the difference in processing speeds, this should not be a serious restriction.
82a8eaa224c4eea457f4fb18547ba793fa508017
Jobs
0
9
169
144
2011-01-06T10:30:23Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
1f23e1c506d4e9850bf88b9a6e7730dc14220552
172
169
2011-02-08T14:35:38Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the second request above would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
25a1865f14f2c3a951d62ded4a52fded64e2f9ed
203
172
2011-03-28T18:43:40Z
Mjh
2
/* Basic commands */
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
e329ead361b391e4c839da675b984b0c9c09e544
210
203
2011-04-22T21:20:35Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
babeb6cb3182406bd4e7d00bf3e2dfd9f6a8d751
Software
0
17
173
114
2011-02-09T15:02:43Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in /soft/aips
* <us>[[CASA]]</u>: 3.1.0 installed in /soft/casapy-31.0.13530-002-64b
73bb54a4cb6985265678a426dba2f3f030aac0dd
177
173
2011-02-09T15:14:45Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in /soft/aips
* <u>[[CASA]]</u>: 3.1.0 installed in /soft/casapy-31.0.13530-002-64b
4f3ef4c8ce8112fdac5324f8ad68784dbcb26de9
197
177
2011-03-23T15:53:13Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in /soft/aips
* <u>[[CASA]]</u>: 3.1.0 installed in /soft/casapy-31.0.13530-002-64b
* <u>[[IDL]]</u>: in /soft/idl/idl/bin
16000f4c2480e51e91e71a7d87d8ecb9df1933af
AIPS
0
27
174
2011-02-09T15:09:06Z
Mjh
2
Created page with 'AIPS is software for radio astronomy data reduction. It is installed on the cluster in /soft/aips . To use aips you will need to be in the aipsuser group. From the head node, l…'
wikitext
text/x-wiki
AIPS is software for radio astronomy data reduction. It is installed on the cluster in /soft/aips .
To use aips you will need to be in the aipsuser group.
From the head node, log in to the machine you have been instructed to use -- aips cannot be used through the batch job system -- using e.g.
<tt>ssh -X smp1</tt>. Then do <tt>/soft/aips/START_AIPS tv=local</tt>. Disc 1 will be a local disc. Optionally, do <tt>soft/aips/START_AIPS tv=local da=stri-cluster</tt> to get access to the cluster data area -- but you are recommended not to try to use this for data reduction.
870b53dfea61ca729328e780487e8f287064d22e
175
174
2011-02-09T15:09:28Z
Mjh
2
wikitext
text/x-wiki
AIPS is software for radio astronomy data reduction. It is installed on the cluster in /soft/aips .
To use aips you will need to be in the aipsuser group.
From the head node, log in to the machine you have been instructed to use -- aips cannot be used through the batch job system -- using e.g.
<tt>ssh -X smp1</tt>. Then do <tt>/soft/aips/START_AIPS tv=local</tt>. Disc 1 will be a local disc. Optionally, do <tt>/soft/aips/START_AIPS tv=local da=stri-cluster</tt> to get access to the cluster data area -- but you are recommended not to try to use this for data reduction.
08296865773817205b810e728b6fcb84cc9750ed
CASA
0
28
176
2011-02-09T15:13:33Z
Mjh
2
Created page with 'CASA is software for radio astronomy data reduction. It is installed on the cluster at /soft/casapy-31.0.13530-002-64b/ . To use casa, do <tt>setenv PATH /soft/casapy-31.0.13530…'
wikitext
text/x-wiki
CASA is software for radio astronomy data reduction. It is installed on the cluster at /soft/casapy-31.0.13530-002-64b/ .
To use casa, do <tt>setenv PATH /soft/casapy-31.0.13530-002-64b:$PATH</tt> and then run it with <tt>casapy</tt>.
You should not run CASA on the head node: either run it through the batch job system or log into a node that you have been assigned for interactive use.
85931352a08f3710286a57981d1c1b9e7db38a64
Acknowledgements
0
29
179
2011-02-18T09:07:14Z
Mjh
2
Created page with 'If possible, please say explicitly that you have used the STRI cluster in any paper you publish that makes use of results obtained using it. We don't insist on any explicit form…'
wikitext
text/x-wiki
If possible, please say explicitly that you have used the STRI cluster in any paper you publish that makes use of results obtained using it.
We don't insist on any explicit form of words, but an example might be 'This work has made use of the University of Hertfordshire Science and Technology Research Institute high-performance computing facility.'
The cluster doesn't really have an outward-facing web presence, but [http://star.herts.ac.uk/progs/computing.html] might be of some use to some people.
Please also add details of any submitted, accepted or published paper using the cluster to the [[Bibliography]] page.
8b02997bce791b69736c8a4cade949980007c01e
180
179
2011-02-18T09:07:40Z
Mjh
2
wikitext
text/x-wiki
If possible, please say explicitly that you have used the STRI cluster in any paper you publish that makes use of results obtained using it.
We don't insist on any explicit form of words, but an example might be 'This work has made use of the University of Hertfordshire Science and Technology Research Institute high-performance computing facility.'
The cluster doesn't really have an outward-facing web presence, but [http://star.herts.ac.uk/progs/computing.html] might be of some use to some people.
Please also add details of any submitted, accepted or published paper using the cluster to the [[Cluster bibliography]] page.
1c32780b985605c824693f4a3a30a6e17cd27a7d
Cluster bibliography
0
30
181
2011-02-18T09:09:21Z
Mjh
2
Created page with 'Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. * Hardcastle MJ, Croston JH, Modelling TeV gamm…'
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, submitted to MNRAS, 2011 Feb
e5b83df9e8f495e07df875bfe03f47eba14db8b8
182
181
2011-02-18T11:31:56Z
Mjh
2
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, submitted to MNRAS, 2011 Feb
b27d0993585254041b56190c96d900aefee67b67
183
182
2011-02-18T12:37:06Z
Gsousa
6
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, submitted to MNRAS, 2011 Feb
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at BMC Neuroscience 2010
dd4d54d2c542325113c4b5e59d04ceb72d00185e
184
183
2011-02-18T12:41:52Z
Karen
7
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, submitted to MNRAS, 2011 Feb
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at BMC Neuroscience 2010
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at BMC Neuroscience 11, P92
553945b76108dcab3600f17e7c7ea4eb52d22bd4
190
184
2011-03-16T20:30:35Z
Mjh
2
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, accepted by MNRAS, 2011 Mar
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at BMC Neuroscience 2010
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at BMC Neuroscience 11, P92
891cac386a82e7e2441751ec933b1eb81071f8db
Web server
0
32
189
2011-03-16T20:29:27Z
Mjh
2
Created page with 'If you create a directory <tt>public_html</tt> in your home directory, then its contents are visible at <tt>http://stri-cluster.herts.ac.uk/~your-username/</tt>. Like all other s…'
wikitext
text/x-wiki
If you create a directory <tt>public_html</tt> in your home directory, then its contents are visible at <tt>http://stri-cluster.herts.ac.uk/~your-username/</tt>. Like all other stri-cluster web pages, this is not visible outside the University, but you may use this facility to export data etc within the university.
c948dde882b27f8f96b6bf6cb406be0bc6eb70f5
Modules
0
33
192
2011-03-21T21:12:40Z
Mjh
2
Created page with 'The cluster uses the <tt>environment-modules</tt> package to manage environment variable settings for software packages. When a 'module' is loaded, the appropriate environment se…'
wikitext
text/x-wiki
The cluster uses the <tt>environment-modules</tt> package to manage environment variable settings for software packages. When a 'module' is loaded, the appropriate environment settings are added to the start of your PATH, MANPATH, LD_LIBRARY_PATH etc, and other variables used by the relevant package may be set as well. When a module is unloaded, the changes made to environment variables are undone.
Documentation of this package is available at this link[http://modules.sourceforge.net/], or type <tt>man module</tt>
Basic commands include:
* <tt>module list</tt>. See what modules you have loaded.
* <tt>module avail</tt>. List what modules are available to you.
* <tt>module load [modulename]</tt>. Load a module, e.g. <tt>module load mpich2-local</tt>
* <tt>module unload [modulename]</tt>. Unload a module.
Modules currently available are:
* <tt>mpich2-x86_64</tt>: Fedora standard implementation of MPICH2. Loaded by default.
* <tt>mpich2-local</tt>: Local version of mpich2. Will generally be more up-to-date than <tt>mpich2-x86_64</tt>.
* <tt>mpich2-intel</tt>: A version of mpich2 compiled with the Intel compiler.
* <tt>mvapich2></tt>: The MVAPICH2 implementation of [[MPI]].
* <tt>OpenMPI></tt>: The OpenMPI implementation of [[MPI]].
You may use <tt>module</tt> commands in your .bashrc or .cshrc. For example, I have
<pre>
module unload mpich2-x86_64
module load mpich2-local
</pre>
as the first two lines of my .cshrc.
We are happy to add other environments as modules -- please contact the cluster [[Administrators]].
79c51d1143103a668739a60dcb417fdced66b7c6
193
192
2011-03-21T21:26:16Z
Mjh
2
wikitext
text/x-wiki
The cluster uses the <tt>environment-modules</tt> package to manage environment variable settings for software packages. When a 'module' is loaded, the appropriate environment settings are added to the start of your PATH, MANPATH, LD_LIBRARY_PATH etc, and other variables used by the relevant package may be set as well. When a module is unloaded, the changes made to environment variables are undone.
Documentation of this package is available at this link[http://modules.sourceforge.net/], or type <tt>man module</tt>
Basic commands include:
* <tt>module list</tt>. See what modules you have loaded.
* <tt>module avail</tt>. List what modules are available to you.
* <tt>module load [modulename]</tt>. Load a module, e.g. <tt>module load mpich2-local</tt>
* <tt>module unload [modulename]</tt>. Unload a module.
Modules currently available are:
* <tt>mpich2-x86_64</tt>: Fedora standard implementation of MPICH2. Loaded by default.
* <tt>mpich2-local</tt>: Local version of mpich2. Will generally be more up-to-date than <tt>mpich2-x86_64</tt>.
* <tt>mpich2-intel</tt>: A version of mpich2 compiled with the Intel compiler.
* <tt>mvapich2</tt>: The MVAPICH2 implementation of [[MPI]].
* <tt>OpenMPI</tt>: The OpenMPI implementation of [[MPI]].
You may use <tt>module</tt> commands in your .bashrc or .cshrc. For example, I have
<pre>
module unload mpich2-x86_64
module load mpich2-local
</pre>
as the first two lines of my .cshrc.
We are happy to add other environments as modules -- please contact the cluster [[Administrators]].
d9542746c51cb5ab8fa71561d7050ca8cafe8de4
Interactive jobs
0
35
208
2011-04-22T21:18:59Z
Mjh
2
Created page with 'Interactive jobs are jobs which give you an interactive session on one of the compute nodes. Importantly, accessing the compute nodes this way means that the job control system g…'
wikitext
text/x-wiki
Interactive jobs are jobs which give you an interactive session on one of the compute nodes. Importantly, accessing the compute nodes this way means that the job control system guarantees the resources that you have asked for. If you simply log into a compute node with ssh (which is in any case strongly discouraged by the cluster access [[policies]]), another user's job may compete with what you are trying to do. Therefore you should, if possible, always use the interactive job facility to run interactively on the compute nodes.
== Running an interactive job ==
An interactive job is run using the <tt>qsub</tt> command as for normal [[jobs]]. However, the result of running an interactive job is that you are logged on to the target machine. For example,
<pre>
[user@stri-cluster ~]$ qsub -l walltime=00:30:00 -I -q main
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@node047 ~]$
</pre>
In this example the user has requested a 30-minute session on any node in the main cluster (the default request is one processor on one node). A node is available to run the job and so the user finds herself at a shell prompt. She may now use the node interactively for 30 minutes. At the end of the 30 minutes (wall time, not CPU time!) the job will end and the session will be terminated. If the user logs out before then, the job will terminate early.
Note that, just as with ordinary jobs, you must honour the allocation you are given. If you ask for one CPU on one node, you must only use that. A number of environment variables beginning with PBS (try <tt>printenv | grep PBS</tt> are provided to tell you the job environment in which you are running in case you have forgotten.
If your request for an interactive shell cannot be fulfilled, the qsub command will wait until it can.
== Advanced topics ==
=== Multiple CPUs ===
If you know you want a machine completely dedicated to you (e.g. because you plan to run multithreaded code interactively) then you must explicitly request that: e.g.,
<pre>
qsub -l walltime=24:00:00 -l nodes=1:ppn=48 -I -q smp
</pre>
will reserve all 48 cores of one of the [[SMP machines]] for you for a day.
=== Multiple nodes ===
In the unlikely situation in which you want to run interactively on several nodes at once, you will only be given one shell. You can use the <tt>pbsdsh</tt> command to run the same commands on each of the processors you have been allocated, or /usr/local/bin/mpiexec to run [[MPI]] jobs.
<pre>
qsub -l walltime=00:01:00 -l nodes=2:ppn=2 -I -q smp
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@smp2 ~]$ pbsdsh hostname
smp2
smp1
smp1
smp2
</pre>
=== X forwarding ===
If you want to run jobs that use the X windows system, you may turn on X11 forwarding using the <tt>-X</tt> option. (This is like the <tt>-X</tt> option to <tt>ssh</tt>.
=== Walltime requests ===
Please be considerate in your walltime requests. While you have an interactive job running, no other user can use the resources you have allocated. Therefore, you should not set an interactive job running for a week and walk away. Jobs that appear to be reserving resources without using them will be terminated. Obviously the best strategy is to select a walltime that roughly matches what you want and exit before the time is up.
234d72f2f705a765b736a5ed2f98f9b14c87def3
209
208
2011-04-22T21:19:24Z
Mjh
2
wikitext
text/x-wiki
Interactive jobs are jobs which give you an interactive session on one of the compute nodes. Importantly, accessing the compute nodes this way means that the job control system guarantees the resources that you have asked for. If you simply log into a compute node with ssh (which is in any case strongly discouraged by the cluster access [[policies]]), another user's job may compete with what you are trying to do. Therefore you should, if possible, always use the interactive job facility to run interactively on the compute nodes.
== Running an interactive job ==
An interactive job is run using the <tt>qsub</tt> command as for normal [[jobs]]. However, the result of running an interactive job is that you are logged on to the target machine. For example,
<pre>
[user@stri-cluster ~]$ qsub -l walltime=00:30:00 -I -q main
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@node047 ~]$
</pre>
In this example the user has requested a 30-minute session on any node in the main cluster (the default request is one processor on one node). A node is available to run the job and so the user finds herself at a shell prompt. She may now use the node interactively for 30 minutes. At the end of the 30 minutes (wall time, not CPU time!) the job will end and the session will be terminated. If the user logs out before then, the job will terminate early.
Note that, just as with ordinary jobs, you must honour the allocation you are given. If you ask for one CPU on one node, you must only use that. A number of environment variables beginning with PBS (try <tt>printenv | grep PBS</tt> are provided to tell you the job environment in which you are running in case you have forgotten.
If your request for an interactive shell cannot be fulfilled, the qsub command will wait until it can.
== Advanced topics ==
=== Multiple CPUs ===
If you know you want a machine completely dedicated to you (e.g. because you plan to run multithreaded code interactively) then you must explicitly request that: e.g.,
<pre>
qsub -l walltime=24:00:00 -l nodes=1:ppn=48 -I -q smp
</pre>
will reserve all 48 cores of one of the [[SMP machines]] for you for a day.
=== Multiple nodes ===
In the unlikely situation in which you want to run interactively on several nodes at once, you will only be given one shell. You can use the <tt>pbsdsh</tt> command to run the same commands on each of the processors you have been allocated, or /usr/local/bin/mpiexec to run [[MPI]] jobs.
<pre>
qsub -l walltime=00:01:00 -l nodes=2:ppn=2 -I -q smp
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@smp2 ~]$ pbsdsh hostname
smp2
smp1
smp1
smp2
</pre>
=== X forwarding ===
If you want to run jobs that use the X windows system, you may turn on X11 forwarding using the <tt>-X</tt> option. (This is like the <tt>-X</tt> option to <tt>ssh</tt>.)
=== Walltime requests ===
Please be considerate in your walltime requests. While you have an interactive job running, no other user can use the resources you have allocated. Therefore, you should not set an interactive job running for a week and walk away. Jobs that appear to be reserving resources without using them will be terminated. Obviously the best strategy is to select a walltime that roughly matches what you want and exit before the time is up.
033b32d8a8078fd1e0db4990ca9631372bedbfe4
Parallelization
0
14
213
46
2011-04-23T07:42:41Z
Mjh
2
wikitext
text/x-wiki
It is very important to realise that running a job on the cluster does not automatically give you access to all the CPU resources of the cluster (or even of the subset of the cluster you request from the [[jobs|job control system]]).
There is one, and only one, situation in which you can run normal, single-threaded code on the cluster and get an improvement in performance. That is the situation in which ''you can usefully run multiple copies of the same program simultaneously without any communication between the different copies''; in other words, you are only limited by how many CPUs you can get access to. Examples might include Monte Carlo simulation or some data analysis tasks. Problems of this kind are called ''embarrassingly parallel''. Provided that your code is thread-safe — that is, it doesn't have implementation errors that prevent it being run several times simultaneously, such as use of temporary files with the same name — you can use the cluster for this sort of problem without modifying your code. You may be able to use the [[job control system|jobs]] with commands such as <tt>pbsdsh</tt>, or you may need to run an [[interactive jobs|interactive job]].
Once you need the different parts of your code to communicate, you in general ''must'' use code that is written specifically to do so. At present [[MPI]] is the method provided to do this. If you are planning to run an application written by a third party, it needs to use [[MPI]] if you expect to be able to run on more than one node. If you are running your own code, you will need to modify it, often very substantially, to allow parallel execution. '''This is your responsibility, not that of the cluster administrators.'''
961028ae7e5bf4b8c451f117b5317b658293a40b
Parallelization
0
14
214
213
2011-04-23T07:43:09Z
Mjh
2
wikitext
text/x-wiki
It is very important to realise that running a job on the cluster does not automatically give you access to all the CPU resources of the cluster (or even of the subset of the cluster you request from the [[jobs|job control system]]).
There is one, and only one, situation in which you can run normal, single-threaded code on the cluster and get an improvement in performance. That is the situation in which ''you can usefully run multiple copies of the same program simultaneously without any communication between the different copies''; in other words, you are only limited by how many CPUs you can get access to. Examples might include Monte Carlo simulation or some data analysis tasks. Problems of this kind are called ''embarrassingly parallel''. Provided that your code is thread-safe — that is, it doesn't have implementation errors that prevent it being run several times simultaneously, such as use of temporary files with the same name — you can use the cluster for this sort of problem without modifying your code. You may be able to use the [[jobs|job control system]] with commands such as <tt>pbsdsh</tt>, or you may need to run an [[interactive jobs|interactive job]].
Once you need the different parts of your code to communicate, you in general ''must'' use code that is written specifically to do so. At present [[MPI]] is the method provided to do this. If you are planning to run an application written by a third party, it needs to use [[MPI]] if you expect to be able to run on more than one node. If you are running your own code, you will need to modify it, often very substantially, to allow parallel execution. '''This is your responsibility, not that of the cluster administrators.'''
08f0125bc8db50e23b1c4e16b7188ddb9db366ca
Interactive jobs
0
35
215
209
2011-04-23T07:48:09Z
Mjh
2
wikitext
text/x-wiki
Interactive jobs are jobs which give you an interactive session on one of the compute nodes. Importantly, accessing the compute nodes this way means that the job control system guarantees the resources that you have asked for. If you simply log into a compute node with ssh (which is in any case strongly discouraged by the cluster access [[policies]]), another user's job may compete with what you are trying to do. Therefore you should, if possible, always use the interactive job facility to run interactively on the compute nodes.
== Running an interactive job ==
An interactive job is run using the <tt>qsub</tt> command as for normal [[jobs]]. However, the result of running an interactive job is that you are logged on to the target machine. For example,
<pre>
[user@stri-cluster ~]$ qsub -l walltime=00:30:00 -I -q main
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@node047 ~]$
</pre>
In this example the user has requested a 30-minute session on any node in the main cluster (the default request is one processor on one node). A node is available to run the job and so the user finds herself at a shell prompt. She may now use the node interactively for 30 minutes. At the end of the 30 minutes (wall time, not CPU time!) the job will end and the session will be terminated. If the user logs out before then, the job will terminate early.
Note that, just as with ordinary jobs, you must honour the allocation you are given. If you ask for one CPU on one node, you must only use that. In the interactive shell, a number of environment variables beginning with PBS (try <tt>printenv | grep PBS</tt>) are provided to tell you the job environment in which you are running in case you have forgotten.
If your request for an interactive shell cannot be fulfilled (because there are insufficient nodes available in the queue that you have specified), the qsub command will wait until it can be.
== Advanced topics ==
=== Multiple CPUs ===
If you know you want a machine completely dedicated to you (e.g. because you plan to run multithreaded code interactively) then you must explicitly request that: e.g.,
<pre>
qsub -l walltime=24:00:00 -l nodes=1:ppn=48 -I -q smp
</pre>
will reserve all 48 cores of one of the [[SMP machines]] for you for a day.
=== Multiple nodes ===
In the unlikely situation in which you want to run interactively on several nodes at once, you will only be given one shell. You can use the <tt>pbsdsh</tt> command to run the same commands on each of the processors you have been allocated, or /usr/local/bin/mpiexec to run [[MPI]] jobs.
<pre>
qsub -l walltime=00:01:00 -l nodes=2:ppn=2 -I -q smp
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@smp2 ~]$ pbsdsh hostname
smp2
smp1
smp1
smp2
</pre>
=== Specific machines ===
It is possible to request a specific machine just as for normal non-interactive [[jobs]]:
<pre>
qsub -l walltime=00:01:00 -l nodes=smp2:ppn=48 -I -q smp
</pre>
Do this if you know that you need to use some particular feature of the machine, such as the [[SMP machines]]' scratch discs.
=== X forwarding ===
If you want to run jobs that use the X windows system, you may turn on X11 forwarding using the <tt>-X</tt> option. (This is like the <tt>-X</tt> option to <tt>ssh</tt>.)
=== Walltime requests ===
Please be considerate in your walltime requests. While you have an interactive job running, no other user can use the resources you have allocated. Therefore, you should not set an interactive job running for a week and walk away. Jobs that appear to be reserving resources without using them will be terminated. Obviously the best strategy is to select a walltime that roughly matches what you want and exit before the time is up.
a96afda426acd8573deb7c6dab464a78029ad7b4
SMP machines
0
24
216
159
2011-04-23T07:49:01Z
Mjh
2
wikitext
text/x-wiki
The SMP machines are two 4-processor, 48-core systems each with 256 Gb of RAM.
The individual cores in these machines are slightly slower (~ 20% on CPU-bound floating-point operations) than the Intel CPUs of the main computing nodes. So users are recommended to use the main part of the cluster for straightforward computing requirements that do not need the special features of the SMP machines.
The big advantage of the SMP machines is the large amount of physical memory visible to all 48 cores. This allows for multi-threaded, shared-memory applications.
The SMP machines also each have 10 Tb of local scratch disc space (mounted as /scratch).
Jobs may be started on the SMP machines using the <tt>smp</tt> [[queues|queue]] in Torque.
e7b6d5eaf2f9b871b1975e7e8de2a18ed237d3a9
Architecture
0
7
217
198
2011-04-23T07:50:33Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development)
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* Ethernet and infiniband switches to provide connectivity.
The nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
http://stri-cluster.herts.ac.uk/cluster.jpg
f24b3509a63a9e0bcb02990e466a2e71c3278f73
219
217
2011-04-23T07:55:43Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 80 compute nodes (or just 'nodes'), of which
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form the CAIR cluster (chassis 4 and 5)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development)
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: so there are 5 chassis in total, 3 in the main cluster and 2 in the CAIR cluster. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the nodes and storage, but does not reflect the current physical configuration of the cluster.
http://stri-cluster.herts.ac.uk/cluster.jpg
0cd085a8760619e9e9c94839acd70f990677153d
228
219
2011-05-26T17:18:42Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 96 compute nodes (or just 'nodes'), of which
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster and the CAR cluster (chassis 6)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development)
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 6 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows some of the nodes and storage, but does not reflect the current physical configuration of the cluster.
http://stri-cluster.herts.ac.uk/cluster.jpg
5903edbf214aac09aa217da9f2f5786e1829ddf0
244
228
2011-09-06T12:08:29Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 96 compute nodes (or just 'nodes'), of which
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the CAR cluster (chassis 7)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development)
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 7 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows some of the nodes and storage, but does not reflect the current physical configuration of the cluster.
http://stri-cluster.herts.ac.uk/cluster.jpg
849a9336f5dddf6ec5bc692e68f45008ad73f8dc
248
244
2011-09-21T16:54:39Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 96 compute nodes (or just 'nodes'), of which
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the CAR cluster (chassis 7)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development)
* 110 Tb of [[storage]] attached via Fibre Channel to the head node
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 7 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the current physical layout of the cluster components.
http://stri-cluster.herts.ac.uk/cluster2.jpg
8f2fb0bf2dee6c4cc8e9d49a05ba53bf48c9059e
Networking
0
10
218
143
2011-04-23T07:53:37Z
Mjh
2
wikitext
text/x-wiki
The nodes are linked by Gigabit ethernet and Infiniband networks.
The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet switch to which all nodes in that chassis are connected. These switches, together with the head node, are connected together via a main ethernet switch. The links connecting the head node and the SMP machines, and the uplinks from the two node internal ethernet stacks, are carried by multiple physical cables via link aggregation. A seperate physical ethernet network carries management traffic.
There are in fact two infiniband networks: one for the main cluster, which is mostly dual data-rate (nominal 2.5 Gbit/s) and one for the CAIR cluster, which is quad data-rate (5.0 Gbit/s). As for the ethernet, each chassis has an internal Infiniband switch. The two sub-clusters each have an external Infiniband switch which connects the chassis that make up the cluster and the head node is connected, separately, to both of those switches. Therefore there is no native infiniband connectivity between the two sub-clusters. IP over Infiniband can be used to connect between them, but this should not be relied on for large data volumes as it must be routed via the head node.
The networking arrangements should be taken into consideration when running jobs where fast communication is important -- latency and data transfer rates are somewhat higher between nodes in the same chassis than between different chassis in the same cluster, and ethernet connections between the two sub-clusters are higher-latency and lower-bandwidth still, while native Infiniband connections between the two sub-clusters are essentially impossible. Best results will be obtained running jobs within a single chassis.
None of the nodes are on the public Internet: they all have private IP addresses starting 192.168.... These addresses are not routable from anywhere else in the University. They can, however, all see the public Internet via IP masquerading through the head node through the ethernet network. (For this reason, it is not sensible to try to make high-volume accesses to the Internet from the nodes.)
Each node has an IP address corresponding to its ethernet port (192.168.2.xxx where xxx is the node number) and one corresponding to the Infiniband port (192.168.3.xxx or 4.xxx, where the 3 and 4 refer to the main and CAIR clusters respectively). Within the cluster, you may use nodexxx.data (e.g. node001.data) and nodexxx.infi (node001.infi) to refer to these two networks. High-volume traffic should use the Infiniband network.
The two SMP machines are attached to the Infiniband network of the main cluster and are called smp1 and smp2: thus they have addresses smp1.data, smp1.infi etc.
7e2292b696771b22f430dd85f5e06b8c6ff4ab55
245
218
2011-09-06T12:09:44Z
Mjh
2
wikitext
text/x-wiki
The nodes are linked by Gigabit ethernet and Infiniband networks.
The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet switch to which all nodes in that chassis are connected. These switches, together with the head node, are connected together via a main ethernet switch. The links connecting the head node and the SMP machines, and the uplinks from the two node internal ethernet stacks, are carried by multiple physical cables via link aggregation. A seperate physical ethernet network carries management traffic.
There are in fact two infiniband networks: one for the main and CAR clusters, which is mostly dual data-rate (nominal 2.5 Gbit/s) and one for the CAIR cluster, which is quad data-rate (5.0 Gbit/s). As for the ethernet, each chassis has an internal Infiniband switch. The two sub-clusters each have an external Infiniband switch which connects the chassis that make up the cluster and the head node is connected, separately, to both of those switches. Therefore there is no native infiniband connectivity between the two sub-clusters. IP over Infiniband can be used to connect between them, but this should not be relied on for large data volumes as it must be routed via the head node.
The networking arrangements should be taken into consideration when running jobs where fast communication is important -- latency and data transfer rates are somewhat higher between nodes in the same chassis than between different chassis in the same cluster, and ethernet connections between the two sub-clusters are higher-latency and lower-bandwidth still, while native Infiniband connections between the two sub-clusters are essentially impossible. Best results will be obtained running jobs within a single chassis.
None of the nodes are on the public Internet: they all have private IP addresses starting 192.168.... These addresses are not routable from anywhere else in the University. They can, however, all see the public Internet via IP masquerading through the head node through the ethernet network. (For this reason, it is not sensible to try to make high-volume accesses to the Internet from the nodes.)
Each node has an IP address corresponding to its ethernet port (192.168.2.xxx where xxx is the node number) and one corresponding to the Infiniband port (192.168.3.xxx or 4.xxx, where the 3 and 4 refer to the main/CAR and CAIR clusters respectively). Within the cluster, you may use nodexxx.data (e.g. node001.data) and nodexxx.infi (node001.infi) to refer to these two networks. High-volume traffic should use the Infiniband network.
The two SMP machines are attached to the Infiniband network of the main/CAR cluster and are called smp1 and smp2: thus they have addresses smp1.data, smp1.infi etc.
5ca2e68e05a6649aa940155ccc18bc4135d2094b
MPI
0
12
220
194
2011-04-23T08:24:22Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Ethernet network, and so the latency and total available bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
For documentation of <tt>mpiexec</tt>, see [http://www.osc.edu/~djohnson/mpiexec/index.php]. By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=16:ppn=8 set, as in the example above, there will be 16*8=128 MPI processes (8 per node). If for some reason you want to alter this default behaviour (e.g. if your MPI code needs to run multi-threaded, as would be the case if you were mixing MPI and OpenMP), then <tt>mpiexec</tt> has options to allow this: run <tt>/usr/local/bin/mpiexec</tt> on the head node with no arguments to get a list of possible options.
=== MPICH2 (local versions) ===
Locally compiled, more up-to-date versions of MPICH2 are available. To use these use <tt>modules</tt> commands: do
<pre>
module unload mpich2-x86_64
module load mpich2-local
OR
module load mpich2-intel
</pre>
Then
<pre>
which mpicc
/soft/mpich2/bin/mpicc
</pre>
If you wish to use these permanently, then you are recommended to put these module commands in your .cshrc or .bashrc.
Jobs compiled this way should also be run with /usr/local/bin/mpiexec.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available.
To use MVAPICH2 do
<pre>
module unload mpich2-x86_64
module load mvapich2
</pre>
Then you should see
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
<tt>mpiexec</tt> also works for starting MVAPICH2 jobs:
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected via the [[modules]] system. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
22fd4dbffdd83b7f2f39407e40bbee2593b901d9
Jobs
0
9
221
210
2011-04-28T21:23:20Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). (Note that in the default <tt>qstat</tt> view the CPU time is not displayed correctly.) The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
808512aa9bac838e38a1b349548f68de440d3073
236
221
2011-07-17T11:50:09Z
Mjh
2
/* Basic commands */
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
929b87fb6a1d3da1e22410554df71c0d6e9d8c4c
239
236
2011-08-17T13:26:01Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -o: specify where standard output/error should be stored, if not /home/user
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
54af1e5872ae2461700434b1a4c3b3a0e75310ec
242
239
2011-08-17T13:38:50Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be preserved (by default in your home directory).
* -j: specify whether the output and error streams should be kept separate or merged.
* -o: specify where standard output/error should be stored, if not /home/user
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
7d503ccbcc0d8e1815ca5e958c257a0172076df1
250
242
2011-10-18T15:55:43Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
0b4a8c5c674c96f95ae6965d6f5dc53e1d4a86a0
Memory
0
36
222
2011-04-28T21:40:11Z
Mjh
2
Created page with 'Jobs that will use a large amount of physical memory must make sure that they request this in the job submission command or script. As described in the section on [[architecture…'
wikitext
text/x-wiki
Jobs that will use a large amount of physical memory must make sure that they request this in the job submission command or script.
As described in the section on [[architecture]], the nodes of the main cluster have 24 Gb of physical memory. If the total amount of memory used by all jobs running on the nodes is larger than, or even starts to approach this value, than performance will drop, jobs may be unable to allocate memory and will eventually be killed, and the nodes may even crash (as the Linux out-of-memory killer appears often to leave a node in an unstable state).
To try to avoid this situation arising, jobs submitted to the main [[queue]] are limited to request, by default, no more than 1 Gb of physical memory per process. If your job exceeds this, it will be automatically killed. If you need more than this, you must request it explicitly using the <tt>pmem</tt> attribute in the job control system. <tt>pmem</tt> sets the amount of physical memory, per process, that your jobs will use. So, for example, if you need 8 Gb of memory per process, an example job submission script would look like this:
<pre>
#!/bin/sh -f
#PBS -N large-job
#PBS -m abe
#PBS -l nodes=8
#PBS -l walltime=00:01:00
#PBS -l pmem=8gb
#PBS -k oe
... job commands go here ...
</pre>
This job will attempt to run 8 separate processes, but the scheduling system will not (as it would by default) place them all on the same node, since it knows that the processes would require more memory than is available. They will be placed on separate nodes where their memory requirements will not interfere with each other.
It's the user's responsibility to figure out a sensible memory request for large jobs. Err on the side of generosity, but bear in mind that asking for too much memory will make the scheduling of both your and other users' jobs inefficient.
1ea223c88c0636ec48dc80f207b679db7eb5b07e
223
222
2011-04-28T21:42:15Z
Mjh
2
wikitext
text/x-wiki
Jobs that will use a large amount of physical memory must make sure that they request this in the job submission command or script.
As described in the section on [[architecture]], the nodes of the main cluster have 24 Gb of physical memory. If the total amount of memory used by all jobs running on the nodes is larger than, or even starts to approach this value, than performance will drop, jobs may be unable to allocate memory and will eventually be killed, and the nodes may even crash (as the Linux out-of-memory killer appears often to leave a node in an unstable state).
To try to avoid this situation arising, jobs submitted to the main [[queue]] are limited to request, by default, no more than 1 Gb of physical memory per process. If your job exceeds this, it will be automatically killed. If you need more than this, you must request it explicitly using the <tt>pmem</tt> attribute in the job control system. <tt>pmem</tt> sets the amount of physical memory, per process, that your jobs will use. So, for example, if you need 8 Gb of memory per process, an example job submission script would look like this:
<pre>
#!/bin/sh -f
#PBS -N large-job
#PBS -m abe
#PBS -l nodes=8
#PBS -l walltime=00:01:00
#PBS -l pmem=8gb
#PBS -k oe
... job commands go here ...
</pre>
This job will attempt to run 8 separate processes, but the scheduling system will not (as it would by default) place them all on the same node, since it knows that the processes would require more memory than is available. They will be placed on separate nodes where their memory requirements will not interfere with each other.
It's the user's responsibility to figure out a sensible memory request for large jobs. Err on the side of generosity, but bear in mind that asking for too much memory will make the scheduling of both your and other users' jobs inefficient.
Users needing very large amounts of memory should consider using the [[SMP machines]], which have 256 Gb of physical memory. (If doing this, it is still sensible to use the <tt>pmem</tt> job attribute to make sure that you will not interfere with other users.)
783612000f84e816b06810ea94ef821e37b5f125
224
223
2011-04-28T21:42:44Z
Mjh
2
wikitext
text/x-wiki
Jobs that will use a large amount of physical memory must make sure that they request this in the job submission command or script.
As described in the section on [[architecture]], the nodes of the main cluster have 24 Gb of physical memory. If the total amount of memory used by all jobs running on the nodes is larger than, or even starts to approach this value, than performance will drop, jobs may be unable to allocate memory and will eventually be killed, and the nodes may even crash (as the Linux out-of-memory killer appears often to leave a node in an unstable state).
To try to avoid this situation arising, jobs submitted to the main [[queues|queue]] are limited to request, by default, no more than 1 Gb of physical memory per process. If your job exceeds this, it will be automatically killed. If you need more than this, you must request it explicitly using the <tt>pmem</tt> attribute in the job control system. <tt>pmem</tt> sets the amount of physical memory, per process, that your jobs will use. So, for example, if you need 8 Gb of memory per process, an example job submission script would look like this:
<pre>
#!/bin/sh -f
#PBS -N large-job
#PBS -m abe
#PBS -l nodes=8
#PBS -l walltime=00:01:00
#PBS -l pmem=8gb
#PBS -k oe
... job commands go here ...
</pre>
This job will attempt to run 8 separate processes, but the scheduling system will not (as it would by default) place them all on the same node, since it knows that the processes would require more memory than is available. They will be placed on separate nodes where their memory requirements will not interfere with each other.
It's the user's responsibility to figure out a sensible memory request for large jobs. Err on the side of generosity, but bear in mind that asking for too much memory will make the scheduling of both your and other users' jobs inefficient.
Users needing very large amounts of memory should consider using the [[SMP machines]], which have 256 Gb of physical memory. (If doing this, it is still sensible to use the <tt>pmem</tt> job attribute to make sure that you will not interfere with other users.)
279930f0708ee0d291aae663f00311c69b92b9b9
225
224
2011-04-28T21:56:40Z
Mjh
2
wikitext
text/x-wiki
Jobs that will use a large amount of physical memory must make sure that they request this in the job submission command or script.
As described in the section on [[architecture]], the nodes of the main cluster have 24 Gb of physical memory. If the total amount of memory used by all jobs running on the nodes is larger than, or even starts to approach this value, than performance will drop, jobs may be unable to allocate memory and will eventually be killed, and the nodes may even crash (as the Linux out-of-memory killer appears often to leave a node in an unstable state).
To try to avoid this situation arising, jobs submitted to the main [[queues|queue]] are limited to request, by default, no more than 1 Gb of physical memory per process. If your job exceeds this, it will be automatically killed. If you need more than this, you must request it explicitly using the <tt>pmem</tt> attribute in the job control system. <tt>pmem</tt> sets the amount of physical memory, per process, that your jobs will use. So, for example, if you need 8 Gb of memory per process, an example job submission script would look like this:
<pre>
#!/bin/sh -f
#PBS -N large-job
#PBS -m abe
#PBS -l nodes=8
#PBS -l walltime=00:01:00
#PBS -l pmem=8gb
#PBS -k oe
... job commands go here ...
</pre>
This job will attempt to run 8 separate processes, but the scheduling system will not (as it would by default) place them all on the same node, since it knows that the processes would require more memory than is available. They will be placed on separate nodes where their memory requirements will not interfere with each other.
It's the user's responsibility to figure out a sensible memory request for large jobs. Err on the side of generosity, but bear in mind that asking for too much memory will make the scheduling of both your and other users' jobs inefficient. Please bear in mind that the typical cluster job runs very comfortably in 1 Gb. You can see how much physical memory a running job is using by doing <tt>qstat -f <jobid></tt>: the line <tt>resources_used.mem</tt> tells you the total memory use for all processes.
Users needing very large amounts of memory should consider using the [[SMP machines]], which have 256 Gb of physical memory. (If doing this, it is still sensible to use the <tt>pmem</tt> job attribute to make sure that you will not interfere with other users.)
65a00886a896fb6aee2859724550de0b9837e95d
Queues
0
15
226
187
2011-04-28T22:03:54Z
Mjh
2
wikitext
text/x-wiki
There are five possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to the 32 CAIR nodes, with a maximum wall time of 6 hours. Users may have at most 2 jobs in this queue at a time.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' is a special queue that submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
02deecebabea649e94d19cfc24fab7e2fc930448
229
226
2011-05-26T17:21:22Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to the 46 CAIR nodes, with a maximum wall time of 6 hours. Users may have at most 2 jobs in this queue at a time.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 4 dedicated CAR machines. This queue is restricted to CAR machines. It is not currently possible to run Infiniband jobs on the CAR machines.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
c82a5b69ba7b5a3f7ca3d13feeff83009bb0e229
230
229
2011-05-26T17:21:35Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to the 46 CAIR nodes, with a maximum wall time of 6 hours. Users may have at most 2 jobs in this queue at a time.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 4 dedicated CAR machines. This queue is restricted to CAR users. It is not currently possible to run Infiniband jobs on the CAR machines.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
706e8e173e2730666a83e30ee8ae184049ccc26c
241
230
2011-08-17T13:32:03Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to the 46 CAIR nodes, with a maximum wall time of 6 hours. Users may have at most 2 jobs in this queue at a time.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 4 dedicated CAR machines. This queue is restricted to CAR users.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
1330d451091183ba4337e23402f93e7fab9ceb40
243
241
2011-09-06T12:07:08Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to the 46 CAIR nodes, with a maximum wall time of 6 hours. Users may have at most 2 jobs in this queue at a time.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 16 dedicated CAR machines. This queue is restricted to CAR users.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
2b5cc44b74f917dd53f77978494ccfb71d72a9c0
Policies
0
4
227
207
2011-05-20T11:35:07Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing or processing of results from the compute nodes.
* The only permitted method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly using ssh. If you need an interactive session on a compute node, e.g. for data processing, use the [[interactive jobs]] facility.
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* There is a fair-share policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that if you have been making heavy use of the cluster on a timescale of days (to be exact, a weighted integral over the past week is taken) future jobs will be given a lower priority; you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on this timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
a06f9b8e42b458f58054a068fc477e830b2157ce
261
227
2012-01-14T09:07:03Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing or processing of results from the compute nodes.
* The only permitted method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly using ssh. If you need an interactive session on a compute node, e.g. for data processing, use the [[interactive jobs]] facility.
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* There is a fair-share policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that if you have been making heavy use of the cluster on a timescale of days (to be exact, a weighted integral over the past week is taken) future jobs will be given a lower priority; you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on this timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
* If you are one of the group of people with exclusive or privileged use of a group of nodes, you must use those nodes in preference to the nodes of the main cluster if possible. You should not run jobs in the 'main' queue unless no nodes in your reserved group are available. This applies to members of CAIR and CAR.
901124238308950bef7f3cabfff1acbb9ae40c13
Known problems
0
25
231
211
2011-06-09T14:17:56Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency and bandwidth is lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* QDR infiniband uplinks between chassis4/5 and the infiniband switch in that rack do not work at the expected speed.
* I/O intensive operations on the data or home directories can cause nodes to lock up in some situations. This only happens occasionally, but is clearly associated with NFS activity.
481b46c059f96225b42113ba129bba251116e785
240
231
2011-08-17T13:30:39Z
Mjh
2
/* Known problems */
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* QDR infiniband uplinks between chassis4/5/6 and the infiniband switch in that rack do not work at the expected speed. This affects the speed of jobs in the CAIR nodes that attempt to run over more than one chassis.
* I/O intensive operations on the data or home directories can cause nodes to lock up in some situations. This only happens occasionally, but is clearly associated with NFS activity.
4e7ec76a96bf370b80354369386f005b8ade078b
246
240
2011-09-12T07:19:33Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* I/O intensive operations on the data or home directories can cause nodes to lock up in some situations. This only happens occasionally, but is clearly associated with NFS activity.
008f8e20b6dd724b3da7accf8b42ceb79a2b08dd
252
246
2011-10-30T09:31:35Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* I/O intensive operations on the data or home directories can cause nodes to lock up in some situations. This only happens occasionally, but is clearly associated with NFS activity. We know of several workarounds so if your job regularly crashes nodes please discuss it with us.
4113badb4817ceff38fcdbef73c1743afe15b028
Cluster bibliography
0
30
232
190
2011-06-23T08:32:02Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Kukol, A, Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', '''2011''', doi:10.1016/j.ejmech.2011.05.026
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, accepted by ''MNRAS'', '''2011''' Mar
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
15f7efb58265e1c079c09c5c4a9c8c4ea70c0f0d
235
232
2011-07-11T18:39:18Z
Mjh
2
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Kukol, A, Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', '''2011''', doi:10.1016/j.ejmech.2011.05.026
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, accepted by ''MNRAS'', '''2011''' Mar
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
bbfa0de766db6a3593f7c7ebd1a586ca5be78f82
237
235
2011-07-29T08:34:45Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Kalia M, Kukol, A, Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', '''2011''', doi:10.1016/j.jsb.2011.07.012
* Kukol A, Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', '''2011''', doi:10.1016/j.ejmech.2011.05.026
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, accepted by ''MNRAS'', '''2011''' Mar
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
d1dcd0876e8c8a5404c811e2352a56e024daff67
238
237
2011-07-29T08:35:08Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Kalia M, Kukol A, Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', '''2011''', doi:10.1016/j.jsb.2011.07.012
* Kukol A, Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', '''2011''', doi:10.1016/j.ejmech.2011.05.026
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, accepted by ''MNRAS'', '''2011''' Mar
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
43ec64f6e55489860a0f45f467b48bf67cbd8c59
249
238
2011-10-12T13:42:56Z
Mjh
2
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Kalia M, Kukol A, Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', '''2011''', doi:10.1016/j.jsb.2011.07.012
* Kukol A, Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', '''2011''', doi:10.1016/j.ejmech.2011.05.026
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', MNRAS 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
41941558c32d756aa57368a2fea03906b96a4828
Main Page
0
1
233
191
2011-07-08T08:49:44Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
b10a5ab89e22cce2e2bb80bd2b8ac02afec47b07
257
233
2012-01-07T09:35:06Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
378c8f2c083a95fe35c77ff2778916b1ad7750ee
262
257
2012-02-15T15:52:27Z
Mjh
2
/* Cluster basics */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
f7e7f2112da99fc812ef64eb982327d8164a3db6
Why doesn't my job run?
0
37
234
2011-07-08T09:32:42Z
Mjh
2
Created page with 'If you submit a [[jobs|job]] and it stays in the queue, it may not be clear why nothing is happening. Obviously one possibility is that there is a problem with the cluster: but i…'
wikitext
text/x-wiki
If you submit a [[jobs|job]] and it stays in the queue, it may not be clear why nothing is happening. Obviously one possibility is that there is a problem with the cluster: but it is also possible that things are going on that you are not seeing.
To ask the scheduler what is going on with a job you can use the command <tt>/usr/local/maui/bin/checkjob</tt>. A typical <tt>checkjob</tt> run might look like this (note the <tt>-v</tt> option):
<pre>
/usr/local/maui/bin/checkjob -v 123456
checking job 123456 (RM job '123456.stri-cluster.herts.ac.uk')
State: Idle
Creds: user:fred group:fred class:main qos:DEFAULT
WallTime: 00:00:00 of 7:00:00:00
SubmitTime: Fri Jul 8 09:04:48
(Time Queued Total: 00:38:52 Eligible: 00:38:52)
Total Tasks: 24
Req[0] TaskCount: 24 Partition: ALL
Network: [NONE] Memory >= 1024M Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [main]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1 MEM: 1024M
NodeAccess: SHARED
TasksPerNode: 8 NodeCount: 3
IWD: [NONE] Executable: [NONE]
Bypass: 63 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
PE: 24.00 StartPriority: 2513
job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 24 procs found)
idle procs: 732 feasible procs: 0
Rejection Reasons: [Features : 58][CPU : 22][State : 18][ReserveTime : 8]
Detailed Node Availability Information:
node001 rejected : ReserveTime
node002 rejected : ReserveTime
node003 rejected : ReserveTime
node004 rejected : State
node005 rejected : ReserveTime
node006 rejected : ReserveTime
node007 rejected : ReserveTime
node008 rejected : ReserveTime
node009 rejected : ReserveTime
node010 rejected : CPU
node011 rejected : CPU
node012 rejected : CPU
node013 rejected : State
node014 rejected : CPU
node015 rejected : CPU
node016 rejected : CPU
node017 rejected : State
node018 rejected : State
node019 rejected : State
node020 rejected : State
node021 rejected : State
node022 rejected : State
node023 rejected : State
node024 rejected : State
node025 rejected : State
node026 rejected : State
node027 rejected : State
node028 rejected : State
node029 rejected : State
node030 rejected : State
node031 rejected : State
node032 rejected : CPU
node033 rejected : CPU
node034 rejected : CPU
node035 rejected : CPU
node036 rejected : CPU
node037 rejected : CPU
node038 rejected : CPU
node039 rejected : CPU
node040 rejected : CPU
node041 rejected : State
node042 rejected : CPU
node043 rejected : CPU
node044 rejected : CPU
node045 rejected : CPU
node046 rejected : CPU
node047 rejected : CPU
node048 rejected : CPU
node049 rejected : Features
node050 rejected : Features
node051 rejected : Features
node052 rejected : Features
node053 rejected : Features
node054 rejected : Features
node055 rejected : Features
node056 rejected : Features
node057 rejected : Features
node058 rejected : Features
node059 rejected : Features
node060 rejected : Features
node061 rejected : Features
node062 rejected : Features
node063 rejected : Features
node064 rejected : Features
node065 rejected : Features
node066 rejected : Features
node067 rejected : Features
node068 rejected : Features
node069 rejected : Features
node070 rejected : Features
node071 rejected : Features
node072 rejected : Features
node073 rejected : Features
node074 rejected : Features
node075 rejected : Features
node076 rejected : Features
node077 rejected : Features
node078 rejected : Features
node079 rejected : Features
node080 rejected : Features
sandbox1 rejected : Features
sandbox2 rejected : Features
sandbox3 rejected : Features
sandbox4 rejected : Features
sandbox5 rejected : Features
sandbox6 rejected : Features
sandbox7 rejected : Features
sandbox8 rejected : Features
sandbox9 rejected : Features
sandbox10 rejected : Features
node081 rejected : Features
node082 rejected : Features
node083 rejected : Features
node084 rejected : Features
node085 rejected : Features
node086 rejected : Features
node087 rejected : Features
node088 rejected : Features
node089 rejected : Features
node090 rejected : Features
node091 rejected : Features
node092 rejected : Features
node093 rejected : Features
node094 rejected : Features
node095 rejected : Features
node096 rejected : Features
job cannot run in partition SMP (insufficient idle procs available: 0 < 24)
</pre>
How do you interpret all this output?
First of all, if your output does not look anything like this, for example if you see an error message about not being able to contact a server, then the scheduler is not running or there is some other problem. In that case, contact one of the [[administrators]].
Assuming your output looks like the above, first of all you should check that the details of your job agree with what you think you submitted. Check <tt>NodeCount</tt> and <tt>TasksPerNode</tt> and the <tt>WallTime</tt> request.
Now go down to the reason why <tt>job cannot run in partition DEFAULT</tt>. Normally, this will be as above: this is telling you that there are not enough available CPUs for your job. But suppose you think the cluster is close to empty. Why is this?
Look down the list of individual nodes to see why each one is rejecting your job. The example above shows the common reasons:
* Features: this means that you have asked for a feature that the node in question doesn't have. For example, jobs submitted to the main cluster will never run on the [[sandbox]] machines. It is normal for many nodes to reject your job for this reason.
* State: usually this means that the node in question is marked busy by the system, which means that it is running as many jobs as possible. In this example, there are a number of busy nodes. It can also mean that nodes are down, i.e. that there is a real problem. If many nodes reject your job because of 'State' but you think they cannot be busy, check <tt>pbsnodes -a</tt> to see if they are 'down' and report a problem if so.
* CPU: the node is not busy, but it has less available CPU than you have requested. This is most likely because it is running one or more jobs. In the example above, the user has asked for 3 nodes with 8 CPUs each: the nodes rejecting the job because of CPU do not have 8 free CPUs. (If all nodes are rejecting your job for this reason, you should check whether you are asking for an impossible configuration. Jobs submitted to the main cluster asking for more than 8 CPUs will sit there forever waiting for a machine with a suitable number of cores to become available. Does your job really need 3 machines with 8 CPUs, or would it work with 24 CPUs distributed over the cluster?
* ReserveTime: your job asks to run on the node at a time when it is reserved for another purpose. This could be due to another job ahead of you in the queue, or it could be a system reservation. If you see this message but there are no jobs ahead of you in the queue, check your e-mail for a recent e-mail from the adminstrators regarding scheduled downtime.
If you are seeing all of these and you can convince yourself that the individual nodes are rejecting your jobs for valid reasons, then the only thing you can do is wait for the conditions that are blocking your job to clear. Only if you can't convince yourself of this should you contact the administrators.
0a4625ad8d20bbd43bcdbd5e461e4be5ce646794
260
234
2012-01-13T13:44:57Z
Mjh
2
wikitext
text/x-wiki
If you submit a [[jobs|job]] and it stays in the queue, it may not be clear why nothing is happening. Obviously one possibility is that there is a problem with the cluster: but it is also possible that things are going on that you are not seeing.
To ask the scheduler what is going on with a job you can use the command <tt>/usr/local/maui/bin/checkjob</tt>. A typical <tt>checkjob</tt> run might look like this (note the <tt>-v</tt> option):
<pre>
/usr/local/maui/bin/checkjob -v 123456
checking job 123456 (RM job '123456.stri-cluster.herts.ac.uk')
State: Idle
Creds: user:fred group:fred class:main qos:DEFAULT
WallTime: 00:00:00 of 7:00:00:00
SubmitTime: Fri Jul 8 09:04:48
(Time Queued Total: 00:38:52 Eligible: 00:38:52)
Total Tasks: 24
Req[0] TaskCount: 24 Partition: ALL
Network: [NONE] Memory >= 1024M Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [main]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1 MEM: 1024M
NodeAccess: SHARED
TasksPerNode: 8 NodeCount: 3
IWD: [NONE] Executable: [NONE]
Bypass: 63 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
PE: 24.00 StartPriority: 2513
job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 24 procs found)
idle procs: 732 feasible procs: 0
Rejection Reasons: [Features : 58][CPU : 22][State : 18][ReserveTime : 8]
Detailed Node Availability Information:
node001 rejected : ReserveTime
node002 rejected : ReserveTime
node003 rejected : ReserveTime
node004 rejected : State
node005 rejected : ReserveTime
node006 rejected : ReserveTime
node007 rejected : ReserveTime
node008 rejected : ReserveTime
node009 rejected : ReserveTime
node010 rejected : CPU
node011 rejected : CPU
node012 rejected : CPU
node013 rejected : State
node014 rejected : CPU
node015 rejected : CPU
node016 rejected : CPU
node017 rejected : State
node018 rejected : State
node019 rejected : State
node020 rejected : State
node021 rejected : State
node022 rejected : State
node023 rejected : State
node024 rejected : State
node025 rejected : State
node026 rejected : State
node027 rejected : State
node028 rejected : State
node029 rejected : State
node030 rejected : State
node031 rejected : State
node032 rejected : CPU
node033 rejected : CPU
node034 rejected : CPU
node035 rejected : CPU
node036 rejected : CPU
node037 rejected : CPU
node038 rejected : CPU
node039 rejected : CPU
node040 rejected : CPU
node041 rejected : State
node042 rejected : CPU
node043 rejected : CPU
node044 rejected : CPU
node045 rejected : CPU
node046 rejected : CPU
node047 rejected : CPU
node048 rejected : CPU
node049 rejected : Features
node050 rejected : Features
node051 rejected : Features
node052 rejected : Features
node053 rejected : Features
node054 rejected : Features
node055 rejected : Features
node056 rejected : Features
node057 rejected : Features
node058 rejected : Features
node059 rejected : Features
node060 rejected : Features
node061 rejected : Features
node062 rejected : Features
node063 rejected : Features
node064 rejected : Features
node065 rejected : Features
node066 rejected : Features
node067 rejected : Features
node068 rejected : Features
node069 rejected : Features
node070 rejected : Features
node071 rejected : Features
node072 rejected : Features
node073 rejected : Features
node074 rejected : Features
node075 rejected : Features
node076 rejected : Features
node077 rejected : Features
node078 rejected : Features
node079 rejected : Features
node080 rejected : Features
sandbox1 rejected : Features
sandbox2 rejected : Features
sandbox3 rejected : Features
sandbox4 rejected : Features
sandbox5 rejected : Features
sandbox6 rejected : Features
sandbox7 rejected : Features
sandbox8 rejected : Features
sandbox9 rejected : Features
sandbox10 rejected : Features
node081 rejected : Features
node082 rejected : Features
node083 rejected : Features
node084 rejected : Features
node085 rejected : Features
node086 rejected : Features
node087 rejected : Features
node088 rejected : Features
node089 rejected : Features
node090 rejected : Features
node091 rejected : Features
node092 rejected : Features
node093 rejected : Features
node094 rejected : Features
node095 rejected : Features
node096 rejected : Features
job cannot run in partition SMP (insufficient idle procs available: 0 < 24)
</pre>
How do you interpret all this output?
First of all, if your output does not look anything like this, for example if you see an error message about not being able to contact a server, then the scheduler is not running or there is some other problem. In that case, contact one of the [[administrators]].
Assuming your output looks like the above, first of all you should check that the details of your job agree with what you think you submitted. Check <tt>NodeCount</tt> and <tt>TasksPerNode</tt> and the <tt>WallTime</tt> request. You may also want to check the output of <tt>qstat -f <jobid></tt>.
Now go down to the reason why <tt>job cannot run in partition DEFAULT</tt>. Normally, this will be as above: this is telling you that there are not enough available CPUs for your job. But suppose you think the cluster is close to empty. Why is this?
Look down the list of individual nodes to see why each one is rejecting your job. The example above shows the common reasons:
* Features: this means that you have asked for a feature that the node in question doesn't have. For example, jobs submitted to the main cluster will never run on the [[sandbox]] machines. It is normal for many nodes to reject your job for this reason.
* State: usually this means that the node in question is marked busy by the system, which means that it is running as many jobs as possible. In this example, there are a number of busy nodes. It can also mean that nodes are down, i.e. that there is a real problem. If many nodes reject your job because of 'State' but you think they cannot be busy, check <tt>pbsnodes -a</tt> to see if they are 'down' and report a problem if so.
* CPU: the node is not busy, but it has less available CPU than you have requested. This is most likely because it is running one or more jobs. In the example above, the user has asked for 3 nodes with 8 CPUs each: the nodes rejecting the job because of CPU do not have 8 free CPUs. (If all nodes are rejecting your job for this reason, you should check whether you are asking for an impossible configuration. Jobs submitted to the main cluster asking for more than 8 CPUs will sit there forever waiting for a machine with a suitable number of cores to become available. Does your job really need 3 machines with 8 CPUs, or would it work with 24 CPUs distributed over the cluster?
* ReserveTime: your job asks to run on the node at a time when it is reserved for another purpose. This could be due to another job ahead of you in the queue, or it could be a system reservation. If you see this message but there are no jobs ahead of you in the queue, check your e-mail for a recent e-mail from the adminstrators regarding scheduled downtime.
If you are seeing all of these and you can convince yourself that the individual nodes are rejecting your jobs for valid reasons, then the only thing you can do is wait for the conditions that are blocking your job to clear. Only if you can't convince yourself of this should you contact the administrators.
dacbf4610028be6089ab4d51e406f09764c2fb30
Mail
0
18
247
70
2011-09-12T07:25:33Z
Mjh
2
wikitext
text/x-wiki
Various systems on the cluster will want to send you e-mail. By default this will be delivered to a local mailbox: you will need to read it with a command-line mail tool like <tt>mutt</tt> on the head node.
You are advised to set up a <tt>.forward</tt> file which will send it to your normal inbox. To do this, simply create the file containing one line of text, the e-mail address to forward to:
<pre>
cat <<END >.forward
f.bloggs@herts.ac.uk
END
</pre>
Please don't allow your inbox on the cluster to fill up with large messages. If you do not follow these instructions, the administrators reserve the right to delete your mailbox and/or to set up a <tt>.forward</tt> file for you themselves.
309d1a91aff602965fbcd4c748ce0efb5c69945a
258
247
2012-01-07T09:35:58Z
Mjh
2
wikitext
text/x-wiki
Various systems on the cluster will want to send you e-mail. By default this will be delivered to a local mailbox: you will need to read it with a command-line mail tool like <tt>mutt</tt> on the head node.
You are advised to set up a <tt>.forward</tt> file which will send it to your normal inbox. To do this, simply create the file containing one line of text, the e-mail address to forward to:
<pre>
cd
cat <<END >.forward
f.bloggs@herts.ac.uk
END
</pre>
Please don't allow your inbox on the cluster to fill up with large messages. If you do not follow these instructions, the administrators reserve the right to delete your mailbox and/or to set up a <tt>.forward</tt> file for you themselves.
8d4c30267ad498f7fd6481c3d4bdda5bd2899cb5
259
258
2012-01-07T09:36:16Z
Mjh
2
wikitext
text/x-wiki
Various systems on the cluster will want to send you e-mail. By default this will be delivered to a local mailbox: you will need to read it with a command-line mail tool like <tt>mutt</tt> on the head node.
You are advised to set up a <tt>.forward</tt> file in your home directory which will send mail to your normal inbox. To do this, simply create the file containing one line of text, the e-mail address to forward to:
<pre>
cd
cat <<END >.forward
f.bloggs@herts.ac.uk
END
</pre>
Please don't allow your inbox on the cluster to fill up with large messages. If you do not follow these instructions, the administrators reserve the right to delete your mailbox and/or to set up a <tt>.forward</tt> file for you themselves.
8de3df4c1bd76ce2fa835b858d4128e930d2849d
Access
0
5
251
15
2011-10-30T09:23:32Z
Mjh
2
wikitext
text/x-wiki
== Access ==
The [[architecture|head node]] of the cluster is accessible from within the university by ssh to stri-cluster.herts.ac.uk, once you have an [[accounts|account]] set up.
If you are working from a Unix desktop, you should be able to type <tt>ssh username@stri-cluster.herts.ac.uk</tt>. If you are using Windows, you will need a Windows ssh client: we recommend PuTTY[http://www.chiark.greenend.org.uk/~sgtatham/putty/].
Individual compute nodes must be accessed via [[interactive jobs]] run on the head node: see also the [[policies|policy]] relating to this.
dedfe8080401399f50034fe70012b30ad52c6d9e
Storage
0
8
253
153
2012-01-06T12:55:46Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 65 Tb of scratch available to all users, mounted as /stri-data
* 40 Tb of scratch for CAIR users only, mounted as /cair-data
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work in /home, use a symbolic link). A [[quota]] system is in place on /home.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large astronomical datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
No data area on the cluster is currently backed up. You must take responsibility for your own backups. In the longer term we expect to be able to make regular backups of /home and possibly some of /stri-data .
4c3e076ea710b08679b64684e37de1642f5d9b96
Quota
0
38
254
2012-01-06T12:57:06Z
Mjh
2
Created page with 'Use of space on /home is subject to a quota system. You have a fixed amount of space that you are allowed to use: if you exceed this, you will no longer be able to create files o…'
wikitext
text/x-wiki
Use of space on /home is subject to a quota system. You have a fixed amount of space that you are allowed to use: if you exceed this, you will no longer be able to create files on /home.
The current default quota for all users is 20 Gb.
c2b78b679eff0883eb653c0e7a2d46465f133440
255
254
2012-01-07T09:33:34Z
Mjh
2
wikitext
text/x-wiki
Use of space on <tt>/home</tt> is subject to a quota system. You have a fixed amount of space that you are allowed to use: if you exceed this, you will no longer be able to create files on <tt>/home</tt>.
The current default quota for all users is 20 Gb. When you reach 19 Gb, you will be warned and given a period (1 week) in which your usage should be reduced below 19 Gb; if you fail to reduce usage in this period, or if your usage reaches 20 Gb, new file creation will be blocked.
The quota is ''not'' an expected reasonable use for a cluster user. You should try to keep your use of <tt>/home</tt> as low as possible, and certainly lower than 10 Gb.
If you believe you have a specific need to exceed this quota, please contact the cluster [[administrators]].
7dc04054b1fa3505e8b6232bd7a1d0505284664b
256
255
2012-01-07T09:34:25Z
Mjh
2
wikitext
text/x-wiki
Use of space on <tt>/home</tt> is subject to a quota system. You have a fixed amount of space that you are allowed to use: if you exceed this, you will no longer be able to create files on <tt>/home</tt>.
The current default quota for all users is 20 Gb. When you reach 19 Gb, you will be warned and given a period (1 week) in which your usage should be reduced below 19 Gb; if you fail to reduce usage in this period, or if your usage reaches 20 Gb, new file creation will be blocked.
The quota is ''not'' an expected reasonable use for a cluster user. You should try to keep your use of <tt>/home</tt> as low as possible, and certainly lower than 10 Gb.
If you believe you have a specific need to exceed this quota, please contact the cluster [[administrators]].
There is no quota on the various data areas and these are the locations where it is appropriate to store large volumes of data.
03786c0ff9a15d66dff2c386673aeeac01310ef8
Fair share
0
39
263
2012-02-15T16:04:12Z
Mjh
2
Created page with 'There are various systems in place to try to make sure that everyone gets a fair share of the cluster (particularly the main cluster) and that a mixture of long and short, large …'
wikitext
text/x-wiki
There are various systems in place to try to make sure that everyone gets a fair share of the cluster (particularly the main cluster) and that a mixture of long and short, large and small jobs can be run.
Jobs are run based on a priority assigned by the scheduler. The priority takes account of a number of factors:
* Jobs that involve more CPUs intrinsically have higher priority. This is to stop single-CPU work pushing out large jobs.
* Jobs that have been in the queue for longer have higher priority, up to a maximum of 64 idle jobs per user (after that, the jobs will wait in the queue but will not accrue priority)
* Jobs from users who have not recently significantly used the cluster have priority over those from users who have. (This is called the fair-share mechanism: the 'fair share amount' is a weighted integral of usage over the last week, weighted towards more recent use.)
In addition,
* no user can have more than 320 processors' worth of jobs running at once. This is intended to stop a single user monopolizing the cluster
* no user can have a processor-time product that exceeds 1 week x 128 nodes running at any given time. This is intended to stop large long jobs blocking shorter jobs.
These policies apply to all queues but are tailored to the main queue where the competition is highest. If they cause you problems, please contact the [[administrators]]. We are happy to review policies to try to get the fairest result for everyone.
39a33db9485b808cbd49272ddf4bde7438cbedea
Policies
0
4
264
261
2012-02-15T16:05:04Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* The [[architecture|head node]] must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing or processing of results from the compute nodes.
* The only permitted method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly using ssh. If you need an interactive session on a compute node, e.g. for data processing, use the [[interactive jobs]] facility.
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* There is a [[fair share]] policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on a timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
* If you are one of the group of people with exclusive or privileged use of a group of nodes, you must use those nodes in preference to the nodes of the main cluster if possible. You should not run jobs in the 'main' queue unless no nodes in your reserved group are available. This applies to members of CAIR and CAR.
011c6a44c426048bbaf7b0f910051871d4c6eda9
292
264
2013-01-29T12:43:20Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* Accounts are for use by the named user only. You must not allow anyone else to use your account.
* The [[architecture|head node]] must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing or processing of results from the compute nodes.
* The only permitted method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly using ssh. If you need an interactive session on a compute node, e.g. for data processing, use the [[interactive jobs]] facility.
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* There is a [[fair share]] policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on a timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
* If you are one of the group of people with exclusive or privileged use of a group of nodes, you must use those nodes in preference to the nodes of the main cluster if possible. You should not run jobs in the 'main' queue unless no nodes in your reserved group are available. This applies to members of CAIR and CAR.
bcb34eddf4440be2b962aaf3f20cdb699ed7e98e
Software
0
17
265
197
2012-05-04T15:04:39Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in /soft/aips
* <u>[[CASA]]</u>: 3.1.0 installed in /soft/casapy-31.0.13530-002-64b
* <u>[[IDL]]</u>: in /soft/idl/idl/bin
* [[Matlab]]: in <tt>/soft/matlab/R2010a/bin<tt>
28b5a029cec577fa8940d0fa3003423c84430061
266
265
2012-05-04T15:04:55Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in /soft/aips
* <u>[[CASA]]</u>: 3.1.0 installed in /soft/casapy-31.0.13530-002-64b
* <u>[[IDL]]</u>: in /soft/idl/idl/bin
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin<tt>
f77f431e3f07d4eff4cb2e8949abc7b689aad11e
320
266
2013-05-13T10:23:18Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: 3.1.0 installed in <tt>/soft/casapy-31.0.13530-002-64b</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin</tt>
9a0b79f034d89e1751b4b082658459f22af63a2b
AIPS
0
27
267
175
2012-05-04T15:07:12Z
Mjh
2
wikitext
text/x-wiki
AIPS is software for radio astronomy data reduction. It is installed on the cluster in /soft/aips .
To use aips you will need to be in the aipsuser group.
From the head node, use an [[interactive jobs|interactive job]] to get to the machine you want to use. AIPS is set up on some of the nodes, but most people use the [[SMP machines]]. Be sure to use the -X option to get X11 forwarding. Then do <tt>/soft/aips/START_AIPS tv=local</tt>. Disc 1 will be a local disc. Optionally, do <tt>/soft/aips/START_AIPS tv=local da=stri-cluster</tt> to get access to the cluster data area -- but you are recommended not to try to use this for data reduction.
9a5816cb5a22a1b79b7b8bfe8da3ff514d2b6e38
CASA
0
28
268
176
2012-05-04T15:08:24Z
Mjh
2
wikitext
text/x-wiki
CASA is software for radio astronomy data reduction. It is installed on the cluster at casapy-stable-34.0.17353-001-64b .
To use casa, do <tt>setenv PATH /soft/casapy-stable-34.0.17353-001-64b:$PATH</tt> and then run it with <tt>casapy</tt>.
You should not run CASA on the head node: either run it through the batch job system or use an [[interactive jobs|interactive job]].
d5ef8ed73ef8185c0ed5452ab3eeccec18d84c3d
Interactive jobs
0
35
269
215
2012-05-04T15:10:25Z
Mjh
2
wikitext
text/x-wiki
Interactive jobs are jobs which give you an interactive session on one of the compute nodes. Importantly, accessing the compute nodes this way means that the job control system guarantees the resources that you have asked for. If you simply log into a compute node with ssh (which is in any case forbidden by the cluster access [[policies]]), another user's job may compete with what you are trying to do. Therefore you should, unless explicitly authorized otherwise, always use the interactive job facility to run interactively on the compute nodes.
== Running an interactive job ==
An interactive job is run using the <tt>qsub</tt> command as for normal [[jobs]]. However, the result of running an interactive job is that you are logged on to the target machine. For example,
<pre>
[user@stri-cluster ~]$ qsub -l walltime=00:30:00 -I -q main
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@node047 ~]$
</pre>
In this example the user has requested a 30-minute session on any node in the main cluster (the default request is one processor on one node). A node is available to run the job and so the user finds herself at a shell prompt. She may now use the node interactively for 30 minutes. At the end of the 30 minutes (wall time, not CPU time!) the job will end and the session will be terminated. If the user logs out before then, the job will terminate early.
Note that, just as with ordinary jobs, you must honour the allocation you are given. If you ask for one CPU on one node, you must only use that. In the interactive shell, a number of environment variables beginning with PBS (try <tt>printenv | grep PBS</tt>) are provided to tell you the job environment in which you are running in case you have forgotten.
If your request for an interactive shell cannot be fulfilled (because there are insufficient nodes available in the queue that you have specified), the qsub command will wait until it can be.
== Advanced topics ==
=== Multiple CPUs ===
If you know you want a machine completely dedicated to you (e.g. because you plan to run multithreaded code interactively) then you must explicitly request that: e.g.,
<pre>
qsub -l walltime=24:00:00 -l nodes=1:ppn=48 -I -q smp
</pre>
will reserve all 48 cores of one of the [[SMP machines]] for you for a day.
=== Multiple nodes ===
In the unlikely situation in which you want to run interactively on several nodes at once, you will only be given one shell. You can use the <tt>pbsdsh</tt> command to run the same commands on each of the processors you have been allocated, or /usr/local/bin/mpiexec to run [[MPI]] jobs.
<pre>
qsub -l walltime=00:01:00 -l nodes=2:ppn=2 -I -q smp
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@smp2 ~]$ pbsdsh hostname
smp2
smp1
smp1
smp2
</pre>
=== Specific machines ===
It is possible to request a specific machine just as for normal non-interactive [[jobs]]:
<pre>
qsub -l walltime=00:01:00 -l nodes=smp2:ppn=48 -I -q smp
</pre>
Do this if you know that you need to use some particular feature of the machine, such as the [[SMP machines]]' scratch discs.
=== X forwarding ===
If you want to run jobs that use the X windows system, you may turn on X11 forwarding using the <tt>-X</tt> option. (This is like the <tt>-X</tt> option to <tt>ssh</tt>.)
=== Walltime requests ===
Please be considerate in your walltime requests. While you have an interactive job running, no other user can use the resources you have allocated. Therefore, you should not set an interactive job running for a week and walk away. Jobs that appear to be reserving resources without using them will be terminated. Obviously the best strategy is to select a walltime that roughly matches what you want and exit before the time is up.
1e8d3294e31916a19138ea956e787bb1e7f16c17
Storage
0
8
270
253
2012-05-04T15:13:06Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 61 Tb of scratch available to all users, mounted as /stri-data
* 40 Tb of scratch for CAIR users only, mounted as /cair-data
* 19 Tb of scratch for CAIR users only, mounted as /cair-data3
* 39 Tb of scratch for CAIR users only, mounted as /cair-data4
* 19 Tb of scratch for CAIR users only, mounted as /cair-data5
* 58 Tb of scratch for CAR users only, mounted as /car-data
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work in /home, use a symbolic link). A [[quota]] system is in place on /home.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
No data area on the cluster is currently backed up. You must take responsibility for your own backups. In the longer term we expect to be able to make regular backups of /home and possibly some of /stri-data .
a439f5eb6ce2bc83843f13597785284514a7b8aa
277
270
2012-09-18T10:21:25Z
Jonnya
9
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 61 Tb of scratch available to all users, mounted as /stri-data
* 40 Tb of scratch for CAIR users only, mounted as /cair-data
* 19 Tb of scratch for CAIR users only, mounted as /cair-data3
* 39 Tb of scratch for CAIR users only, mounted as /cair-data4
* 19 Tb of scratch for CAIR users only, mounted as /cair-data5
* 58 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch dor CAIR users only, mounted as /dair-storage
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work in /home, use a symbolic link). A [[quota]] system is in place on /home.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
No data area on the cluster is currently backed up. You must take responsibility for your own backups. In the longer term we expect to be able to make regular backups of /home and possibly some of /stri-data .
8dd43a1eb9e34f58f182dbfd614e00dc0db97c87
278
277
2012-09-18T10:21:33Z
Jonnya
9
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 61 Tb of scratch available to all users, mounted as /stri-data
* 40 Tb of scratch for CAIR users only, mounted as /cair-data
* 19 Tb of scratch for CAIR users only, mounted as /cair-data3
* 39 Tb of scratch for CAIR users only, mounted as /cair-data4
* 19 Tb of scratch for CAIR users only, mounted as /cair-data5
* 58 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /dair-storage
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work in /home, use a symbolic link). A [[quota]] system is in place on /home.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
No data area on the cluster is currently backed up. You must take responsibility for your own backups. In the longer term we expect to be able to make regular backups of /home and possibly some of /stri-data .
dcb9c555370a54c8aea5253a3aef63387ca5e527
281
278
2012-10-08T10:41:41Z
Jonnya
9
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 61 Tb of scratch available to all users, mounted as /stri-data
* 40 Tb of scratch for CAIR users only, mounted as /cair-data
* 19 Tb of scratch for CAIR users only, mounted as /cair-data3
* 39 Tb of scratch for CAIR users only, mounted as /cair-data4
* 19 Tb of scratch for CAIR users only, mounted as /cair-data5
* 58 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-storage
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work in /home, use a symbolic link). A [[quota]] system is in place on /home.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
No data area on the cluster is currently backed up. You must take responsibility for your own backups. In the longer term we expect to be able to make regular backups of /home and possibly some of /stri-data .
a6b5e3cd645d52a4298937abed7452ed0d62e7f3
Architecture
0
7
272
248
2012-07-25T07:26:01Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 124 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development)
* 200 Tb of [[storage]] attached via Fibre Channel to the head node
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 8 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011.
http://stri-cluster.herts.ac.uk/cluster2.jpg
b369c918022b53855a55d98fa66083a88318b24d
276
272
2012-09-18T10:20:31Z
Jonnya
9
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 124 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 Gb RAM ('''CAIR use only''').
** 77 Tb of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development).
* 200 Tb of [[storage]] attached via Fibre Channel to the head node.
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 8 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011.
http://stri-cluster.herts.ac.uk/cluster2.jpg
74d85317aa7714ce2e7e818d7ff9865def222a0a
291
276
2012-12-22T07:40:26Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 124 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 Gb RAM ('''CAIR use only''').
** 77 Tb of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development).
* 200 Tb of [[storage]] attached via Fibre Channel to the head node.
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 8 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
66bb7257abf5e272ee0c9b0bb10880b702fcbe7b
Queues
0
15
273
243
2012-07-25T07:27:10Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to the 46 CAIR nodes, with a maximum wall time of 6 hours. Users may have at most 2 jobs in this queue at a time.
* 'cair_l' also submits to the CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
c940bf301186b982ca9d63ab2470f6b919e8a8df
274
273
2012-07-25T07:28:29Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 46 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
5f6933bb3074d7034c2e93277cfa4eacc31f293e
297
274
2013-02-19T12:31:41Z
Jonnya
9
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 46 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 96 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
8e82aad01fc3ed7e5ab17264e64545363d2c8b2c
Cair-cluster
0
40
275
2012-09-18T10:06:45Z
Jonnya
9
Created page with '== Cair data processing server == There is now a dedicated file server for cair users. The hostname is <code>cair-cluster</code> which is accessible from the private data netwo…'
wikitext
text/x-wiki
== Cair data processing server ==
There is now a dedicated file server for cair users. The hostname is <code>cair-cluster</code> which is accessible from the private data network and the UH student network (using the FQDN <code>cair-cluster.herts.ac.uk</code>). The server is a Dell R520 with two Intel Xeon E5-2450L 1.80GHz processors and 32 GB RAM. It is connected to the "cair" InfiniBand network (192.168.4.0) via a dual-port QDR HBA. The server has ~ 77 TB of directly attached (via fibre channel) storage which has been configured to a RAID6 specification and is mounted as /cair-storage (on all cair nodes and the head node)
This server can be used for post processing on large datasets. We have also enabled job submission on this server, so if preferred, cair users do not have to log on to <code>stri-cluster</code> at all.
cdc9a7fbd101f5bdc1651e44ead8630ec03d25ae
Jobs
0
9
279
250
2012-09-30T11:03:05Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
2d4e202726d7f95e646b7017bca438855f0e1bac
293
279
2013-01-31T15:38:56Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running OpenMP code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
279cf2f62e243aed2b09f2dcff06c8275c69bb2c
294
293
2013-01-31T15:39:35Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format.
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
4175245614aa6f63924fae71a359571f27fb3074
311
294
2013-04-11T13:43:56Z
Jonnya
9
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
bc71e160b9c06a218eb3d296ec0e2a9f45b0e4fd
Known problems
0
25
280
252
2012-09-30T11:05:39Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* Most of the compute nodes run FC16, but the head node, where code is compiled, is still running FC14. This can cause library incompatibilities. Let us know if this affects you and we can work around it.
a33d0cd11d8901cf9ee1d5bc607fb9a7d66823e2
296
280
2013-02-08T21:25:39Z
Mjh
2
/* Known problems */
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
529c6c4254222a7fc6a4546e7d1e9f2ac3317954
Cluster bibliography
0
30
282
249
2012-11-30T11:55:40Z
Mjh
2
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A, Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', '''2011''', doi:10.1016/j.jsb.2011.07.012
* Kukol A, Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', '''2011''', doi:10.1016/j.ejmech.2011.05.026
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
df93f749958a5cdf58c7c71693b91a7b4189767c
Main Page
0
1
283
262
2012-12-07T09:53:17Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
* [[Upgrade wishlist]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
476d40722cea3afdf130b171b2bc674af5f5dddc
290
283
2012-12-22T07:36:45Z
Mjh
2
/* Welcome to the cluster documentation wiki */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
* [[Upgrade wishlist]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
5a12b4bd70b4f598a26fbe0ac925bb8cf2728969
302
290
2013-02-23T12:58:46Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
* [[Upgrade wishlist]]
== Cluster basics ==
* [[Accounts]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
eb0e0b057a5aa50ff5cbc1ce1e41ac0fa3481cd6
308
302
2013-03-28T16:46:12Z
Mjh
2
/* Cluster basics */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
* [[Upgrade wishlist]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
b2c8591be0ae5b9f963a3c14ca94f8a2670b04a7
312
308
2013-04-12T13:51:18Z
Mjh
2
/* Known problems */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
5b7f02ea02b005a2ecda466a864fdecf00f88ab3
316
312
2013-04-22T14:33:10Z
Mjh
2
/* Using the cluster */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
85dc8721dcb74915a08d48a6c0f5af51e05380b1
Web server
0
32
289
189
2012-12-21T20:13:43Z
Mjh
2
wikitext
text/x-wiki
The web server <tt>http://stri-cluster.herts.ac.uk/</tt> is visible inside and outside the university.
If you create a directory <tt>public_html</tt> in your home directory, then its contents are visible at <tt>http://stri-cluster.herts.ac.uk/~your-username/</tt>. You can use this to export data; for large datasets, use symbolic links to a data disc. Do not rely on the long-term existence of this facility (e.g. you should not use the cluster to host your personal home page).
74a170da27194419e2bf9f4794620e2fa550a199
User:Aidan Farrow
2
42
298
2013-02-23T11:08:36Z
WikiSysop
1
Creating user page with biography of new user.
wikitext
text/x-wiki
I am a research fellow in CAIR working with numerical models of global climate and atmospheric chemistry.
4d3845a664986fec56e7f679e609e8e99e6040e2
User talk:Aidan Farrow
3
43
299
2013-02-23T11:08:36Z
WikiSysop
1
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [[Help:Contents|help pages]].
Again, welcome and have fun! [[User:WikiSysop|WikiSysop]] 11:08, 23 February 2013 (GMT)
4b83af547e2c44a16866c2471cbe5d05f8d1f863
Parallelization
0
14
300
214
2013-02-23T12:00:49Z
Mjh
2
wikitext
text/x-wiki
It is very important to realise that running a job on the cluster does not automatically give you access to all the CPU resources of the cluster (or even of the subset of the cluster you request from the [[jobs|job control system]]).
There is one, and only one, situation in which you can run normal, single-threaded code on the cluster and get an improvement in performance. That is the situation in which ''you can usefully run multiple copies of the same program simultaneously without any communication between the different copies''; in other words, you are only limited by how many CPUs you can get access to. Examples might include Monte Carlo simulation or some data analysis tasks. Problems of this kind are called ''embarrassingly parallel''. Provided that your code is thread-safe — that is, it doesn't have implementation errors that prevent it being run several times simultaneously, such as use of temporary files with the same name — you can use the cluster for this sort of problem without modifying your code. You may be able to use the [[jobs|job control system]] with commands such as <tt>pbsdsh</tt>, or you may need to run an [[interactive jobs|interactive job]].
Once you need the different parts of your code to communicate, you in general ''must'' use code that is written specifically to do so. At present [[MPI]] is the method provided to do this. If you are planning to run an application written by a third party, it needs to use [[MPI]] if you expect to be able to run on more than one node. If you are running your own code, you will need to modify it, often very substantially, to allow parallel execution. '''This is your responsibility, not that of the cluster administrators.'''
If you just intend to use all the processors on one node, and are not worried about explicit communication between threads, you can use [[OpenMP]], which will often be much less effort to integrate with existing code.
4a80896994505aad9c4ad76ea4f223cbdbe70834
OpenMP
0
44
301
2013-02-23T12:58:27Z
Mjh
2
Created page with "OpenMP is an extension to commonly used programming languages that allows them to make use of the multiple processors ('cores') that are available on all modern PCs, including th..."
wikitext
text/x-wiki
OpenMP is an extension to commonly used programming languages that allows them to make use of the multiple processors ('cores') that are available on all modern PCs, including the cluster nodes. In the best case, you can take existing code, add a few lines, and have parallelizable parts of your code, like loops, running on all available CPUs.
Here's a simple example C program that runs multithreaded:
<pre>
int main(int argc, char *argv[]) {
const int N = 100000;
int i, a[N];
#pragma omp parallel for private(i)
for (i = 0; i < N; i++)
a[i] = 2 * i;
return 0;
}
</pre>
In this example, the loop is parallelized so that all available cores contribute to filling up the array.
OpenMP tutorials are available online, e.g. [http://bisqwit.iki.fi/story/howto/openmp/].
The C and Fortran compilers available on the cluster support OpenMP. For example, gcc works with the flag <tt>-fopenmp</tt>:
<pre>
gcc -fopenmp code.c -o code
</pre>
If you do not compile with the correct flag set, the <tt>#pragma</tt> directives will be ignored!
If you want to run OpenMP code from a job, see the relevant section of the [[Jobs]] page to make sure that you honour your allocation of CPUs.
738e8a6388012a9b0eed75086a2028e58583345e
Mail
0
18
303
259
2013-02-23T13:00:00Z
Mjh
2
wikitext
text/x-wiki
Various systems on the cluster will want to send you e-mail. By default this will be delivered to a local mailbox: you will need to read it with a command-line mail tool like <tt>mutt</tt> on the head node.
You are advised to set up a <tt>.forward</tt> file in your home directory which will send mail to your normal inbox. To do this, simply create the file containing one line of text, the e-mail address to forward to. The commands below can be cut and pasted into a shell:
<pre>
cd
cat <<END >.forward
f.bloggs@herts.ac.uk
END
</pre>
or you can edit the <tt>.forward</tt> file with your favourite editor.
Please don't allow your inbox on the cluster to fill up with large messages. If you do not follow these instructions, the administrators reserve the right to delete your mailbox and/or to set up a <tt>.forward</tt> file for you themselves.
e9e3cb5d4b653ed9924f3fc102826711f754f64c
Acknowledgements
0
29
304
180
2013-02-25T15:48:42Z
Mjh
2
wikitext
text/x-wiki
If possible, please say explicitly that you have used the STRI cluster in any paper you publish that makes use of results obtained using it.
We don't insist on any explicit form of words, but an example might be 'This work has made use of the University of Hertfordshire Science and Technology Research Institute high-performance computing facility.'
If you wish you can add a link to <tt>http://stri-cluster.herts.ac.uk/</tt>.
Please also add details of any submitted, accepted or published paper using the cluster to the [[Cluster bibliography]] page.
239d2a376a91b0174e1ffcb03598ba61a990138c
Administrators
0
6
305
24
2013-03-13T10:16:06Z
Mjh
2
wikitext
text/x-wiki
== Administrators ==
These are currently:
* John Atkinson, j.atkinson@herts.ac.uk (x3358, room E117C)
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk (x3409, room 2E71 (Innovation Centre)).
Contact us with queries. Basic support queries (e.g. account requests, difficulty logging on or using software) should be directed to John in the first instance.
a70b6af6fab85d47c1bb04ba21f4e126915e546a
Accounts
0
3
306
106
2013-03-28T16:41:38Z
Mjh
2
wikitext
text/x-wiki
To get an account, speak to John Atkinson in E117C.
Accounts are available to the following classes of people:
* Members of the Centre for Astrophysics Research (CAR)
* Members of the Centre for Atmospheric & Instrumentation Research (CAIR)
* Other research-active members of the School of Physics, Astronomy and Mathematics (PAM)
* Members of the School of Computer Science (CS)
* Others, by special arrangement; restricted to those who have made a financial contribution to the cluster.
Access is granted subject to observance of our usage [[policies]]. Accounts may be suspended or cancelled if the cluster is abused: see also our [[Account cancellation policy]].
a2b5c1d7bc64cd9c8bcdab1a29b8aa709654bc88
Account cancellation policy
0
45
307
2013-03-28T16:45:49Z
Mjh
2
Created page with "The policy for account closure and deletion is as follows: * People holding accounts on the cluster who are leaving UH or who no longer require their account should let the [[ad..."
wikitext
text/x-wiki
The policy for account closure and deletion is as follows:
* People holding accounts on the cluster who are leaving UH or who no longer require their account should let the [[administrators]] know that they are doing so; supervisors are also responsible for letting us know about departing students, postdocs and visitors.
* When a cluster user leaves UH, their account will be locked (i.e. the account will still exist but it will no longer be possible to log in). A grace period of up to one month will be available, on request, to allow data to be moved off the cluster.
* When an account is locked, we will send an e-mail to the user (if contactable) and their supervisor (if known) telling them that the account has been locked and referring them to this policy.
* Three months after a user has left, all remaining data owned by that user will be completely deleted from the system. It is the responsibility of the owner of the data, or their supervisor, to make sure that all important data has been copied elsewhere before this happens. We will not keep backups of deleted data, nor will we chase people who do not appear to have taken the necessary action.
* Supervisors or other colleagues may ask to take ownership of data belonging to students or postdocs who are leaving, but they thereby also take full reponsibility for dealing with any impact it has on the system.
* If a leaving student or member of staff is expected to be given visiting research fellow/lecturer/professor status, they may ask to be exempted from this policy. Other exemptions/extensions must be negotiated via senior management (e.g. the directors of CAR or CAIR, the Deans of PAM or CS, etc).
dfde54e8c98130b35dc9032c1c9bf956d036371e
309
307
2013-03-28T16:46:40Z
Mjh
2
wikitext
text/x-wiki
The policy for account closure and deletion is as follows:
* People holding accounts on the cluster who are leaving UH or who no longer require their account should let the [[administrators]] know; supervisors are also responsible for letting us know about departing students, postdocs and visitors.
* When a cluster user leaves UH, their account will be locked (i.e. the account will still exist but it will no longer be possible to log in). A grace period of up to one month will be available, on request, to allow data to be moved off the cluster.
* When an account is locked, we will send an e-mail to the user (if contactable) and their supervisor (if known) telling them that the account has been locked and referring them to this policy.
* Three months after a user has left, all remaining data owned by that user will be completely deleted from the system. It is the responsibility of the owner of the data, or their supervisor, to make sure that all important data has been copied elsewhere before this happens. We will not keep backups of deleted data, nor will we chase people who do not appear to have taken the necessary action.
* Supervisors or other colleagues may ask to take ownership of data belonging to students or postdocs who are leaving, but they thereby also take full reponsibility for dealing with any impact it has on the system.
* If a leaving student or member of staff is expected to be given visiting research fellow/lecturer/professor status, they may ask to be exempted from this policy. Other exemptions/extensions must be negotiated via senior management (e.g. the directors of CAR or CAIR, the Deans of PAM or CS, etc).
db8a8c994225b20da4ba6f76905f81c36c6acf28
MPI
0
12
310
220
2013-04-11T12:46:02Z
Gr09aag
8
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Ethernet network, and so the latency and total available bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/local/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
For documentation of <tt>mpiexec</tt>, see [http://www.osc.edu/~djohnson/mpiexec/index.php]. By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=16:ppn=8 set, as in the example above, there will be 16*8=128 MPI processes (8 per node). If for some reason you want to alter this default behaviour (e.g. if your MPI code needs to run multi-threaded, as would be the case if you were mixing MPI and OpenMP), then <tt>mpiexec</tt> has options to allow this: run <tt>/usr/local/bin/mpiexec</tt> on the head node with no arguments to get a list of possible options.
=== MPICH2 (local versions) ===
Locally compiled, more up-to-date versions of MPICH2 are available. To use these use <tt>modules</tt> commands: do
<pre>
module unload mpich2-x86_64
module load mpich2-local
OR
module load mpich2-intel
</pre>
Then
<pre>
which mpicc
/soft/mpich2/bin/mpicc
</pre>
If you wish to use these permanently, then you are recommended to put these module commands in your .cshrc or .bashrc.
Jobs compiled this way should also be run with /usr/local/bin/mpiexec.
=== MVAPICH2 ===
MVAPICH2 is one of the three MPI implementations installed as part of the [http://www.openfabrics.org/ OFED] Infiniband software distribution. It uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available. This does not work on the smp machines (hardware incompatible), use e.g. MPICH2 there instead or use the main queue.
To use MVAPICH2 do
<pre>
module unload mpich2-x86_64
module load mvapich2
</pre>
Then you should see
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-1.4.1/bin/mpicc
</pre>
<tt>mpiexec</tt> also works for starting MVAPICH2 jobs:
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/local/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== MVAPICH ===
This is an earlier Infiniband-aware implementation of MPI. Not recommended.
=== OpenMPI ===
This is the third implementation provided by the OFED packages; like the others, it can be selected via the [[modules]] system. In principle it should be as good as MVIPICH2 from the point of view of Infiniband use. If you want to use it you will need to integrate it with Torque yourself -- please feel free to do so and to document it here.
7b8699a565fd972e5ff9c2a57765ccb749985f6d
Networking
0
10
313
245
2013-04-15T11:43:12Z
Mjh
2
wikitext
text/x-wiki
The nodes are linked by Gigabit ethernet and Infiniband networks.
The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet switch to which all nodes in that chassis are connected. These switches, together with the head node, are connected together via a main ethernet switch. The links connecting the head node and the SMP machines, and the uplinks from the two node internal ethernet stacks, are carried by multiple physical cables via link aggregation. A seperate physical ethernet network carries management traffic.
The infiniband network is slightly more complex. Each chassis has an internal infiniband switch and these are all linked via two main infiniband switches. This arrangement is intended to provide redundancy and higher bandwidth between nodes in different chassis. chassis1-3 use DDR infiniband; all other machines on the network have QDR infiniband cards.
The networking arrangements should be taken into consideration when running jobs where fast communication is important -- latency and data transfer rates are somewhat higher between nodes in the same chassis than between different chassis in the same cluster, and ethernet connections between the two sub-clusters are higher-latency and lower-bandwidth still. Best results will be obtained running jobs within a single chassis.
None of the nodes are on the public Internet: they all have private IP addresses starting 192.168.... These addresses are not routable from anywhere else in the University. They can, however, all see the public Internet via IP masquerading through the head node through the ethernet network. (For this reason, it is not sensible to try to make high-volume accesses to the Internet from the nodes.)
Each node has an IP address corresponding to its ethernet port (192.168.2.xxx where xxx is the node number) and one corresponding to the Infiniband port (192.168.3.xxx). Within the cluster, you may use nodexxx.data (e.g. node001.data) and nodexxx.infi (node001.infi) to refer to these two networks. High-volume traffic should use the Infiniband network.
The two SMP machines have addresses smp1.data, smp1.infi etc.
c5c5f682db0effed1c93288ae2f7197f78665f33
Reservations
0
46
314
2013-04-22T14:32:08Z
Mjh
2
Created page with "It is possible to request reservations of one or more nodes at fixed times in the future. If you have reserved a node, no other jobs than yours will be able to run on it. You ca..."
wikitext
text/x-wiki
It is possible to request reservations of one or more nodes at fixed times in the future. If you have reserved a node, no other jobs than yours will be able to run on it.
You can request a reservation from the [[administrators]] or, if you have a particular need to do this often, we can set things up so that you can create your own reservations automatically.
Currently members of the CTCA group are able to reserve the smp machines themselves by using the command <tt>sudo -u maui /usr/local/bin/reserve-ctca.sh [machine name] [start time] [end time]</tt>, where <tt>[start time]</tt> and <tt>[end time]</tt> have the format [HH[:MM[:SS]]][_MO[/DD[/YY]]], e.g. 14:30_06/20 means 14:30 on the 20th of June this year.
2ea9895e49c1cf343f63fa19e6a60ffdd7396cdf
315
314
2013-04-22T14:32:43Z
Mjh
2
wikitext
text/x-wiki
It is possible to request reservations of one or more nodes at fixed times in the future. If you have reserved a node, no other jobs than yours will be able to run on it.
You can request a reservation from the [[administrators]] or, if you have a particular need to do this often, we can set things up so that you can create your own reservations automatically.
Currently members of the CTCA group are able to reserve the smp machines themselves by using the command
<pre>
sudo -u maui /usr/local/bin/reserve-ctca.sh [machine name] [start time] [end time]</pre>
where <tt>[start time]</tt> and <tt>[end time]</tt> have the format [HH[:MM[:SS]]][_MO[/DD[/YY]]], e.g. 14:30_06/20 means 14:30 on the 20th of June this year.
fbbaafdd29b846cc1fdd226c9157069785993f85
317
315
2013-04-22T14:34:18Z
Mjh
2
wikitext
text/x-wiki
It is possible to request reservations of one or more nodes at fixed times in the future. If you have reserved a node, no other jobs than yours will be able to run on it.
You can request a reservation from the [[administrators]] or, if you have a particular need to do this often, we can set things up so that you can create your own reservations automatically. Please only ask for a reservation if you believe that there is no method of doing what you want within the standard [[Jobs|queueing system]].
Currently members of the CTCA group are able to reserve the smp machines themselves by using the command
<pre>
sudo -u maui /usr/local/bin/reserve-ctca.sh [machine name] [start time] [end time]</pre>
where <tt>[start time]</tt> and <tt>[end time]</tt> have the format [HH[:MM[:SS]]][_MO[/DD[/YY]]], e.g. 14:30_06/20 means 14:30 on the 20th of June this year.
8d095ba70468c522b7b804ec04e899a15aff0b30
318
317
2013-04-22T16:08:42Z
Mjh
2
wikitext
text/x-wiki
It is possible to request reservations of one or more nodes at fixed times in the future. If you have reserved a node, no other jobs than yours will be able to run on it.
You can request a reservation from the [[administrators]] or, if you have a particular need to do this often, we can set things up so that you can create your own reservations automatically. Please only ask for a reservation if you believe that there is no method of doing what you want within the standard [[Jobs|queueing system]].
Currently members of the CTCA group are able to reserve the smp machines themselves by using the command
<pre>
sudo -u maui /usr/local/bin/reserve-ctca.sh [machine name] [start time] [end time]</pre>
where <tt>[start time]</tt> and <tt>[end time]</tt> have the format [HH[:MM[:SS]]][_MO[/DD[/YY]]], e.g. 14:30_06/20 means 14:30 on the 20th of June this year.
General guidelines for user-created reservations are as follows:
* Reserve the machine for a period when no existing jobs will be running -- use <tt>qstat</tt> to determine when this will be.
* Use the reservation by submitting a job as normal. A reservation available to you will be used automatically if it is available. If you need an interactive session, use an [[interactive jobs|interactive job]]; e.g. if you have reserved smp2 for two days, do <tt>qsub -I -q smp -l nodes=smp2:ppn=48 -l walltime=48:00:00</tt>.
* If you no longer need a reservation, e-mail the administrators to ask them to delete it.
ad2f7aa766ddfc4f8b6ec0509b53da0362b568d9
Memory
0
36
319
225
2013-05-01T11:25:30Z
Mjh
2
wikitext
text/x-wiki
Jobs that will use a large amount of physical memory must make sure that they request this in the job submission command or script.
As described in the section on [[architecture]], the nodes of the main cluster have 24 Gb of physical memory. If the total amount of memory used by all jobs running on the nodes is larger than, or even starts to approach this value, than performance will drop, jobs may be unable to allocate memory and will eventually be killed, and the nodes may even crash (as the Linux out-of-memory killer appears often to leave a node in an unstable state).
To make sure that this doesn't happen to your job (or, worse, your job causes it to happen to someone else's) you should specify the amount of physical memory used per process, if it is more than 1 Gb, the default, by using the <tt>pmem</tt> attribute in the job control system. So, for example, if you need 8 Gb of memory per process for 8 processes, an example job submission script would look like this:
<pre>
#!/bin/sh -f
#PBS -N large-job
#PBS -m abe
#PBS -l nodes=8
#PBS -l walltime=00:01:00
#PBS -l pmem=8gb
#PBS -k oe
... job commands go here ...
</pre>
This job will attempt to run 8 separate processes, but the scheduling system will not (as it would by default) place them all on the same node, since it knows that the processes would require more memory than is available. They will be placed on separate nodes where their memory requirements will not interfere with each other.
It's the user's responsibility to figure out a sensible memory request for large jobs. Err on the side of generosity, but bear in mind that asking for too much memory will make the scheduling of both your and other users' jobs inefficient. Please bear in mind that the typical cluster job runs very comfortably in 1 Gb. You can see how much physical memory a running job is using by doing <tt>qstat -f <jobid></tt>: the line <tt>resources_used.mem</tt> tells you the total memory use for all processes.
Users needing very large amounts of memory should consider using the [[SMP machines]], which have 256 Gb of physical memory. (If doing this, it is still sensible to use the <tt>pmem</tt> job attribute to make sure that you will not interfere with other users.)
f7e7cd4642ede90e2f7c1b9cbc94c0806f028cc2
Software
0
17
321
320
2013-05-13T10:24:32Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: 3.1.0 installed in <tt>/soft/casapy-31.0.13530-002-64b</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
53942b3fab7435325534c9c0b492b606116b7464
322
321
2013-05-13T10:24:46Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.0.7 installed in <tt>/soft/gromacs</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: 3.1.0 installed in <tt>/soft/casapy-31.0.13530-002-64b</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
d29cd42d1f44a401b2af01dbcf6f49ca958deddc
328
322
2013-06-14T13:39:07Z
Akukol
3
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.5.5 installed in <tt>/soft/gromacs-new</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: 3.1.0 installed in <tt>/soft/casapy-31.0.13530-002-64b</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
73e86f0b08afc30532943c383a45daa5fcafc35d
333
328
2013-06-25T13:24:34Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.5.5 installed in <tt>/soft/gromacs-new</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar</tt>
b0e4f7225a7ee1a2de273da445156ccc5294c39b
346
333
2013-08-16T09:18:58Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.5.5 installed in <tt>/soft/gromacs-new</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar</tt>
* <u>[[Python packages]]</u>: set your PYTHONPATH to include <tt>/soft/python/lib64/python2.7/site-packages</tt>
de96ae074a0dbbf6d8c283eb39386d58307cb00a
364
346
2013-11-26T17:24:21Z
Dbab
11
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.5.5 installed in <tt>/soft/gromacs-new</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar</tt>
* <u>[[Python packages]]</u>: set your PYTHONPATH to include <tt>/soft/python/lib64/python2.7/site-packages</tt>
* <u>[[neuron]]</u>: in <tt> /soft/nrn</tt>
271ba82bf221032fcf3feffd172213771628ae57
Jobs
0
9
323
311
2013-05-13T11:26:14Z
Mjh
2
/* Jobs that depend on other jobs */
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
6f17c8a4134aca53508cb960eaeb766926f59ac9
340
323
2013-07-28T16:23:09Z
Mjh
2
/* Basic commands */
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://www.clusterresources.com/torquedocs21/ Torque] (formerly known as PBS) and [http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://www.clusterresources.com/torquedocs21/commands/qsub.shtml online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
* <tt>file</tt>: the [[local disk space]] to be used by your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml#submitvars here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
456ef6f004de8e76f329cd512db523520114db2f
Queues
0
15
324
297
2013-05-16T15:07:39Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 48 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 46 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
5f6933bb3074d7034c2e93277cfa4eacc31f293e
339
324
2013-07-18T15:39:30Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 56 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 46 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the two [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
4b40bed020124f8d755fb8bafdcff6117ac664c5
344
339
2013-08-13T16:44:56Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available on the system:
* 'main' is the default queue: this submits to the 56 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 46 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
3ca84b2e818ba4ed5bcb0e87850740d16e3de376
345
344
2013-08-13T16:45:36Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available for general use on the system:
* 'main' is the default queue: this submits to the 56 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 46 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'sandbox' submits to the [[sandbox]] machines and is for use for development and testing. The maximum wall time for this queue is 1 hour.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time for the 'sandbox' queue is also the maximum, i.e. 1 hour.
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on the main queue is 1.
02b8e5ab9d79e27a9acd1c9e88ec2931f4e1426a
Reservations
0
46
325
318
2013-05-17T08:03:19Z
Mjh
2
wikitext
text/x-wiki
It is possible to request reservations of one or more nodes at fixed times in the future. If you have reserved a node, no other jobs than yours will be able to run on it.
You can request a reservation from the [[administrators]] or, if you have a particular need to do this often, we can set things up so that you can create your own reservations automatically. Please only ask for a reservation if you believe that there is no method of doing what you want within the standard [[Jobs|queueing system]].
Currently members of the CTCA group are able to reserve the smp machines themselves by using the command
<pre>
sudo -u maui /usr/local/bin/reserve-ctca.sh [machine name] [start time] [end time]</pre>
where <tt>[start time]</tt> and <tt>[end time]</tt> have the format [HH[:MM[:SS]]][_MO[/DD[/YY]]], e.g. 14:30_06/20 means 14:30 on the 20th of June this year.
Reservations will usually be for a group of people, but may be for an individual. If you need to use a group reservation, you will need to know the name of the group in question, and you will need to belong to that group. Typing <tt>groups</tt> at a shell prompt on the head node will tell you what groups you belong to.
General guidelines for user-created reservations are as follows:
* Reserve the machine for a period when no existing jobs will be running -- use <tt>qstat</tt> to determine when this will be.
* If you are using a personal reservation, use the reservation by submitting a job as normal. A reservation available to you will be used automatically if it is available. If you need an interactive session, use an [[interactive jobs|interactive job]]; e.g. if you have reserved smp2 for two days, do <tt>qsub -I -q smp -l nodes=smp2:ppn=48 -l walltime=48:00:00</tt>.
* If you are using a group reservation, specify that you want to use it by adding the option <tt>-W group_list=[groupname]</tt> to the <tt>qsub</tt> command or script. E.g. to use 8 cores of the <tt>scuba2</tt> group reservation on smp1 interactively, do <tt>qsub -W group_list=scuba2 -q smp -l nodes=smp1:ppn=8 -I</tt>. Again, the reservation will be used if the resources are available, and your job will otherwise go into the general pool.
* If you no longer need a reservation, e-mail the administrators to ask them to delete it.
063d21b0d759109e2b2797971b828af16d670435
326
325
2013-05-17T08:04:04Z
Mjh
2
wikitext
text/x-wiki
It is possible to request reservations of one or more nodes at fixed times in the future. If you have reserved a node, no other jobs than yours will be able to run on it.
You can request a reservation from the [[administrators]] or, if you have a particular need to do this often, we can set things up so that you can create your own reservations automatically. Please only ask for a reservation if you believe that there is no method of doing what you want within the standard [[Jobs|queueing system]].
Currently members of the CTCA group are able to reserve the smp machines themselves by using the command
<pre>
sudo -u maui /usr/local/bin/reserve-ctca.sh [machine name] [start time] [end time]</pre>
where <tt>[start time]</tt> and <tt>[end time]</tt> have the format [HH[:MM[:SS]]][_MO[/DD[/YY]]], e.g. 14:30_06/20 means 14:30 on the 20th of June this year.
Reservations will usually be for a group of people, but may be for an individual. If you need to use a group reservation, you will need to know the name of the group in question, and you will need to belong to that group. Typing <tt>groups</tt> at a shell prompt on the head node will tell you what groups you belong to.
General guidelines for reservations are as follows:
* If creating a reservation yourself, reserve the machine(s) for a period when no existing jobs will be running -- use <tt>qstat</tt> to determine when this will be.
* If you are using a personal reservation, use the reservation by submitting a job as normal. A reservation available to you will be used automatically if it is available. If you need an interactive session, use an [[interactive jobs|interactive job]]; e.g. if you have reserved smp2 for two days, do <tt>qsub -I -q smp -l nodes=smp2:ppn=48 -l walltime=48:00:00</tt>.
* If you are using a group reservation, specify that you want to use it by adding the option <tt>-W group_list=[groupname]</tt> to the <tt>qsub</tt> command or script. E.g. to use 8 cores of the <tt>scuba2</tt> group reservation on smp1 interactively, do <tt>qsub -W group_list=scuba2 -q smp -l nodes=smp1:ppn=8 -I</tt>. Again, the reservation will be used if the resources are available, and your job will otherwise go into the general pool.
* If you no longer need a reservation, e-mail the administrators to ask them to delete it.
8883aa2e86e6f2f4de796e71d208f6d7b20c9b25
Gromacs
0
19
329
105
2013-06-14T13:42:01Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
Look here for [[groperform|optimising performance]].
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs-new/bin/GMXRC
# used previously: export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
export LD_LIBRARY_PATH='/usr/mpi/gcc/mvapich2-1.6/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
db110de53076f81d5309c60e1abc500d7da1b72e
330
329
2013-06-14T13:44:28Z
Akukol
3
/* How to perform a simulation with Gromacs' mdrun: */
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
Look here for [[groperform|optimising performance]].
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -k oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs-new/bin/GMXRC
# used previously: export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
export LD_LIBRARY_PATH='/usr/mpi/gcc/mvapich2-1.6/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
e4ffe0030b698f2d993191e75fae720158fa0b50
Known problems
0
25
331
296
2013-06-22T16:46:19Z
Mjh
2
/* Known problems */
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* The scheduler sometimes crashes for unknown reasons causing jobs not to run.
9546f0b58cfd966b92b7eeb637e755c214d36ae3
352
331
2013-10-01T12:47:27Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* The scheduler sometimes crashes for unknown reasons causing jobs not to run. (Regularly run scripts check and restart the scheduler.)
* The scheduler very occasionally will not run a job that could be run immediately in free resources. If <tt>checkjob</tt> shows 'job can run' and <tt>showstart</tt> shows an immediate start, but your job does not run, then please contact the [[administrators]].
aa7bc1fbc040b995f6fdce8e8efd5030b1a9ff8a
Architecture
0
7
332
291
2013-06-22T16:53:17Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 16 Xeons (E5-2660s) 2 socket x 8-core with 32 Gb RAM and FDR Infiniband form the rest of the main cluster and some dedicated CAR nodes (chassis 9)
* Two [[SMP machines]] each with 48 cores (4 sockets x 12 cores, Opteron 6174), 256 Gb RAM and QDR Infiniband (smp1 and smp2).
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 Gb RAM ('''CAIR use only''').
** 77 Tb of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 9 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development).
* 345 Tb of [[storage]] attached via Fibre Channel to the head node.
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
c91df12f9070cf9edfa041e3649b64336b9fc6fb
350
332
2013-09-27T10:51:54Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM for user logins and development
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 16 Xeons (E5-2660s) 2 socket x 8-core with 32 Gb RAM and FDR Infiniband form the rest of the main cluster and some dedicated CAR nodes (chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 Gb RAM and QDR Infiniband (smp1-3).
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 345 Tb of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 Gb RAM ('''CAIR use only''').
** 77 Tb of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 5 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development).
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
4629217c32135e42840ecdf9e0d91742b22c14be
351
350
2013-10-01T12:44:10Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM for user logins and development
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 16 Xeons (E5-2660s) 2 socket x 8-core with 32 Gb RAM and FDR Infiniband form the rest of the main cluster and some dedicated CAIR nodes (chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 Gb RAM and QDR Infiniband (smp1-3).
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 345 Tb of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 Gb RAM ('''CAIR use only''').
** 77 Tb of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* 5 [[sandbox machines]] (lower-power machines with 1 Gb/core and no Infiniband, intended for testing and development).
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
65c25f176297776113f3d19f9168628d2d110c3c
371
351
2014-02-11T21:32:28Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM for user logins and development
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 16 Xeons (E5-2660s) 2 socket x 8-core with 32 Gb RAM and FDR Infiniband form the rest of the main cluster and some dedicated CAIR nodes (chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 Gb RAM and QDR Infiniband (smp1-3).
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 345 Tb of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 Gb RAM ('''CAIR use only''').
** 77 Tb of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
4750a75ebc18777c4501bc8ebbfc994779c57749
CASA
0
28
334
268
2013-06-25T13:26:07Z
Mjh
2
wikitext
text/x-wiki
CASA is software for radio astronomy data reduction. Various versions are installed on the cluster. The latest version is always in <tt>/soft/casapy</tt> (a symbolic link to the real directory).
To use casa, do <tt>module load casa</tt> and then run it with <tt>casapy</tt>.
You should not run CASA on the head node: either run it through the batch job system or use an [[interactive jobs|interactive job]].
3a4c06990b33121bd4aaa176cc11962be150d89a
LOFAR
0
47
335
2013-06-25T13:27:23Z
Mjh
2
Created page with "To run LOFAR software, do <tt>module load LOFAR</tt> <tt>source /soft/lofar/lofarinit.csh</tt>"
wikitext
text/x-wiki
To run LOFAR software, do
<tt>module load LOFAR</tt>
<tt>source /soft/lofar/lofarinit.csh</tt>
c79c0a807adf24aed35ae8aa9e56f989c3ef43f9
336
335
2013-06-25T13:27:33Z
Mjh
2
wikitext
text/x-wiki
To run LOFAR software, do
<tt>module load lofar</tt>
<tt>source /soft/lofar/lofarinit.csh</tt>
413f0927aede380eef8abca928dc01b89ec84c7d
337
336
2013-06-25T14:20:51Z
Mjh
2
wikitext
text/x-wiki
To run LOFAR software, do
<tt>module load lofar</tt>
<tt>source /soft/lofar/lofarinit.csh</tt>
You will need a <tt>.casarc</tt> file, something like this should do:
<pre>
measures.DE200.directory: /soft/casapy/data/ephemerides
measures.DE405.directory: /soft/casapy/data/ephemerides
measures.line.directory: /soft/casapy/data/ephemerides
measures.sources.directory: /soft/casapy/data/ephemerides
measures.comet.directory: /soft/casapy/data/ephemerides
measures.ierseop97.directory: /soft/casapy/data/geodetic
measures.ierspredict.directory: /soft/casapy/data/geodetic
measures.tai_utc.directory: /soft/casapy/data/geodetic
measures.igrf.directory: /soft/casapy/data/geodetic
measures.observatory.directory: /soft/casapy/data/geodetic
</pre>
2ab6eede147113a5479268426ca9df2931e36b01
348
337
2013-09-05T12:14:51Z
Mjh
2
wikitext
text/x-wiki
To run LOFAR software, do
<pre>
module load lofar
source /soft/lofar/lofarinit.csh
</pre>
You will need a <tt>.casarc</tt> file, something like this should do:
<pre>
measures.DE200.directory: /soft/casapy/data/ephemerides
measures.DE405.directory: /soft/casapy/data/ephemerides
measures.line.directory: /soft/casapy/data/ephemerides
measures.sources.directory: /soft/casapy/data/ephemerides
measures.comet.directory: /soft/casapy/data/ephemerides
measures.ierseop97.directory: /soft/casapy/data/geodetic
measures.ierspredict.directory: /soft/casapy/data/geodetic
measures.tai_utc.directory: /soft/casapy/data/geodetic
measures.igrf.directory: /soft/casapy/data/geodetic
measures.observatory.directory: /soft/casapy/data/geodetic
</pre>
The 'new' <tt>awimager</tt> is installed. Because it uses its own (partial) installation of the LOFAR software, it needs to be run separately. Do
<pre>
module load awimager
source /soft/awimager/lofarinit.csh
</pre>
to give you access to the <tt>awimager</tt> command. Note that the main difference between this version of <tt>awimager</tt> and the standard one that comes with the LOFAR software is that it runs multi-threaded using [[OpenMP]]. You will need to set up use of threads appropriately as described in the [[OpenMP]] page if you want to run this version.
f3a72c670629f39526175273152a7554d24f80a2
349
348
2013-09-05T12:16:05Z
Mjh
2
wikitext
text/x-wiki
To run LOFAR software, do
<pre>
module load lofar
source /soft/lofar/lofarinit.csh
</pre>
You will need a <tt>.casarc</tt> file, something like this should do:
<pre>
measures.DE200.directory: /soft/casapy/data/ephemerides
measures.DE405.directory: /soft/casapy/data/ephemerides
measures.line.directory: /soft/casapy/data/ephemerides
measures.sources.directory: /soft/casapy/data/ephemerides
measures.comet.directory: /soft/casapy/data/ephemerides
measures.ierseop97.directory: /soft/casapy/data/geodetic
measures.ierspredict.directory: /soft/casapy/data/geodetic
measures.tai_utc.directory: /soft/casapy/data/geodetic
measures.igrf.directory: /soft/casapy/data/geodetic
measures.observatory.directory: /soft/casapy/data/geodetic
</pre>
The 'new' <tt>awimager</tt> is installed. Because it uses its own (partial) installation of the LOFAR software, it needs to be run separately. Do
<pre>
module load awimager
source /soft/awimager/lofarinit.csh
</pre>
to give you access to the <tt>awimager</tt> command. Note that the main difference between this version of <tt>awimager</tt> and the standard one that comes with the LOFAR software is that it runs multi-threaded using [[OpenMP]]. You will need to set up use of threads appropriately as described in the [[jobs]] page if you want to run this version.
80d586bafbe777823dcc22b8b66e29fbca12241f
Access
0
5
338
251
2013-07-04T11:27:23Z
Mjh
2
/* Access */
wikitext
text/x-wiki
== Access ==
The [[architecture|head node]] of the cluster is accessible from within the university by ssh to stri-cluster.herts.ac.uk, once you have an [[accounts|account]] set up.
If you are working from a Unix desktop, you should be able to type <tt>ssh username@stri-cluster.herts.ac.uk</tt>. If you are using Windows, you will need a Windows ssh client: we recommend PuTTY[http://www.chiark.greenend.org.uk/~sgtatham/putty/].
Unless specific authorization from the [[administrators]] is provided to the contrary, individual compute nodes must be accessed either through batch [[jobs]] or via [[interactive jobs]] run on the head node: see also the [[policies|policy]] relating to this.
dba255fefd8cfa97684f76feb3a61dbe90857467
Local disk space
0
48
341
2013-07-28T16:42:36Z
Mjh
2
Created page with "The main compute nodes have a limited amount of local disk space (around 50 Gb for nodes001-080 and 110 Gb for nodes081-144). This area is mounted on /local and is only visible i..."
wikitext
text/x-wiki
The main compute nodes have a limited amount of local disk space (around 50 Gb for nodes001-080 and 110 Gb for nodes081-144). This area is mounted on /local and is only visible internally to the node.
The normal expectation is that you will use the central [[storage]] for file operations, including data processing. However, there may be circumstances where it will be useful to copy some data to the nodes, do some I/O intensive operations on it and copy it back to the storage. In this case you may use the /local area.
If you want to do this, to avoid interfering with other jobs:
* You ''must'' reserve the maximum amount of space that your job will use using the <tt>file</tt> option to <tt>qsub</tt>; e.g.
<pre>qsub -l nodes=1,file=10gb</pre>
* You must create a directory in /local in which your job will work as part of your <tt>qsub</tt> script, which will be unique to your job. For example, you might want to do
<pre>
mkdir /local/$PBS_JOBID
cd /local/$PBS_JOBID
</pre>
* You must only work in this directory, and the total filespace you use must not exceed the reserved amount.
* When your job is finished it must before exiting clear up the filespace used; no files must be left in <tt>/local</tt>.
Note that these rules do not apply to the <tt>/scratch</tt> directories on the [[SMP machines]].
19af43dd61db15382d7e196aeb1b26313837b2a9
Storage
0
8
342
281
2013-07-28T16:49:14Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 61 Tb of scratch available to all users, mounted as /stri-data
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 59 Tb of scratch for CAIR users only, mounted as /cair-data
* 167 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work in /home, use a symbolic link). A [[quota]] system is in place on /home.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node).
No data area on the cluster is currently backed up. You must take responsibility for your own backups. In the longer term we expect to be able to make regular backups of /home and possibly some of /stri-data .
79ec6a6f1ecef9c8e402b7da195863ff23536919
SMP machines
0
24
343
216
2013-08-13T16:44:21Z
Mjh
2
wikitext
text/x-wiki
The SMP machines are:
* smp1, smp2: two 4-processor, 48-core systems each with 256 Gb of RAM, available for general use. The individual cores in these machines are slightly slower (~ 20% on CPU-bound floating-point operations) than the Intel CPUs of the main computing nodes. So users are recommended to use the main part of the cluster for straightforward computing requirements that do not need the special features of the SMP machines.
* smp3: one 4-processor, 32-core system with 2.2-GHz E5-4620 Intel CPUs and 256 Gb RAM available to CAR users only.
The big advantage of the SMP machines is the large amount of physical memory visible to all cores. This allows for multi-threaded, shared-memory applications.
The SMP machines all also each have a large amount of local scratch space (10 Tb for smp1/2, 30 Tb for smp3) which is mounted as /scratch on the SMP machines and visible as /smp1, /smp2 and /smp3 on the head node. smp3 is intended for data reduction for CAR users only.
Jobs may be started on the SMP machines using the <tt>smp</tt> [[queues|queue]] in Torque.
b4fb272448695c83b4eee874fb41c227ebad5a30
Python packages
0
49
347
2013-08-16T09:30:53Z
Mjh
2
Created page with "Local python packages installed in <tt>/soft</tt> include * astropy [http://pypi.python.org/packages/source/a/astropy/astropy-0.2.4.tar.gz] * kapteyn [http://www.astro.rug.nl/so..."
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* astropy [http://pypi.python.org/packages/source/a/astropy/astropy-0.2.4.tar.gz]
* kapteyn [http://www.astro.rug.nl/software/kapteyn/]
* h5py
* mpi4py
* hcluster
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these available.
Please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
f5ef9ecb80cabba4d4e9bc3bfeb59bd6ae886a44
Why doesn't my job run?
0
37
353
260
2013-10-01T12:50:41Z
Mjh
2
wikitext
text/x-wiki
If you submit a [[jobs|job]] and it stays in the queue, it may not be clear why nothing is happening. Obviously one possibility is that there is a problem with the cluster: but it is also possible that things are going on that you are not seeing.
To ask the scheduler what is going on with a job you can use the command <tt>/usr/local/maui/bin/checkjob</tt>. A typical <tt>checkjob</tt> run might look like this (note the <tt>-v</tt> option):
<pre>
/usr/local/maui/bin/checkjob -v 123456
checking job 123456 (RM job '123456.stri-cluster.herts.ac.uk')
State: Idle
Creds: user:fred group:fred class:main qos:DEFAULT
WallTime: 00:00:00 of 7:00:00:00
SubmitTime: Fri Jul 8 09:04:48
(Time Queued Total: 00:38:52 Eligible: 00:38:52)
Total Tasks: 24
Req[0] TaskCount: 24 Partition: ALL
Network: [NONE] Memory >= 1024M Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [main]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1 MEM: 1024M
NodeAccess: SHARED
TasksPerNode: 8 NodeCount: 3
IWD: [NONE] Executable: [NONE]
Bypass: 63 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
PE: 24.00 StartPriority: 2513
job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 24 procs found)
idle procs: 732 feasible procs: 0
Rejection Reasons: [Features : 58][CPU : 22][State : 18][ReserveTime : 8]
Detailed Node Availability Information:
node001 rejected : ReserveTime
node002 rejected : ReserveTime
node003 rejected : ReserveTime
node004 rejected : State
node005 rejected : ReserveTime
node006 rejected : ReserveTime
node007 rejected : ReserveTime
node008 rejected : ReserveTime
node009 rejected : ReserveTime
node010 rejected : CPU
node011 rejected : CPU
node012 rejected : CPU
node013 rejected : State
node014 rejected : CPU
node015 rejected : CPU
node016 rejected : CPU
node017 rejected : State
node018 rejected : State
node019 rejected : State
node020 rejected : State
node021 rejected : State
node022 rejected : State
node023 rejected : State
node024 rejected : State
node025 rejected : State
node026 rejected : State
node027 rejected : State
node028 rejected : State
node029 rejected : State
node030 rejected : State
node031 rejected : State
node032 rejected : CPU
node033 rejected : CPU
node034 rejected : CPU
node035 rejected : CPU
node036 rejected : CPU
node037 rejected : CPU
node038 rejected : CPU
node039 rejected : CPU
node040 rejected : CPU
node041 rejected : State
node042 rejected : CPU
node043 rejected : CPU
node044 rejected : CPU
node045 rejected : CPU
node046 rejected : CPU
node047 rejected : CPU
node048 rejected : CPU
node049 rejected : Features
node050 rejected : Features
node051 rejected : Features
node052 rejected : Features
node053 rejected : Features
node054 rejected : Features
node055 rejected : Features
node056 rejected : Features
node057 rejected : Features
node058 rejected : Features
node059 rejected : Features
node060 rejected : Features
node061 rejected : Features
node062 rejected : Features
node063 rejected : Features
node064 rejected : Features
node065 rejected : Features
node066 rejected : Features
node067 rejected : Features
node068 rejected : Features
node069 rejected : Features
node070 rejected : Features
node071 rejected : Features
node072 rejected : Features
node073 rejected : Features
node074 rejected : Features
node075 rejected : Features
node076 rejected : Features
node077 rejected : Features
node078 rejected : Features
node079 rejected : Features
node080 rejected : Features
sandbox1 rejected : Features
sandbox2 rejected : Features
sandbox3 rejected : Features
sandbox4 rejected : Features
sandbox5 rejected : Features
sandbox6 rejected : Features
sandbox7 rejected : Features
sandbox8 rejected : Features
sandbox9 rejected : Features
sandbox10 rejected : Features
node081 rejected : Features
node082 rejected : Features
node083 rejected : Features
node084 rejected : Features
node085 rejected : Features
node086 rejected : Features
node087 rejected : Features
node088 rejected : Features
node089 rejected : Features
node090 rejected : Features
node091 rejected : Features
node092 rejected : Features
node093 rejected : Features
node094 rejected : Features
node095 rejected : Features
node096 rejected : Features
job cannot run in partition SMP (insufficient idle procs available: 0 < 24)
</pre>
How do you interpret all this output?
First of all, if your output does not look anything like this, for example if you see an error message about not being able to contact a server, then the scheduler is not running or there is some other problem: see [[Known problems]]. If you need help in this situation, contact one of the [[administrators]].
Assuming your output looks like the above, first of all you should check that the details of your job agree with what you think you submitted. Check <tt>NodeCount</tt> and <tt>TasksPerNode</tt> and the <tt>WallTime</tt> request. You may also want to check the output of <tt>qstat -f <jobid></tt>.
Now go down to the reason why <tt>job cannot run in partition DEFAULT</tt>. Normally, this will be as above: this is telling you that there are not enough available CPUs for your job. But suppose you think the cluster is close to empty. Why is this?
Look down the list of individual nodes to see why each one is rejecting your job. The example above shows the common reasons:
* Features: this means that you have asked for a feature that the node in question doesn't have. For example, jobs submitted to the main cluster will never run on the [[sandbox]] machines. It is normal for many nodes to reject your job for this reason.
* State: usually this means that the node in question is marked busy by the system, which means that it is running as many jobs as possible. In this example, there are a number of busy nodes. It can also mean that nodes are down, i.e. that there is a real problem. If many nodes reject your job because of 'State' but you think they cannot be busy, check <tt>pbsnodes -l</tt> to see if they are 'down' and report a problem if so. Nodes that are 'offline' in <tt>pbsnodes -l</tt> have been taken offline by the administrators for maintenance and there is no need to report them unless you think this is an error.
* CPU: the node is not busy, but it has less available CPU than you have requested. This is most likely because it is running one or more jobs. In the example above, the user has asked for 3 nodes with 8 CPUs each: the nodes rejecting the job because of CPU do not have 8 free CPUs. (If all nodes are rejecting your job for this reason, you should check whether you are asking for an impossible configuration. Jobs submitted to the main cluster asking for more than 16 CPUs will sit there forever waiting for a machine with a suitable number of cores to become available. Does your job really need 3 machines with 8 CPUs, or would it work with 24 CPUs distributed over the cluster?
* ReserveTime: your job asks to run on the node at a time when it is reserved for another purpose. This could be due to another job ahead of you in the queue, or it could be a system reservation. If you see this message but there are no jobs ahead of you in the queue, check your e-mail for a recent e-mail from the adminstrators regarding scheduled downtime.
If you are seeing all of these and you can convince yourself that the individual nodes are rejecting your jobs for valid reasons, then the only thing you can do is wait for the conditions that are blocking your job to clear. Only if you can't convince yourself of this should you contact the administrators.
5d2c4c3d28374bf9008d49c274c6b0b656c60aad
Star-CCM+
0
50
354
2013-10-22T10:40:10Z
Jonnya
9
Created page with "Star-CCM+ is an engineering package which can be used to solve CDF problems. This guide (kindly written by Vitaly Voloshin) shows you how to use Star-CCM+ on the cluster."
wikitext
text/x-wiki
Star-CCM+ is an engineering package which can be used to solve CDF problems.
This guide (kindly written by Vitaly Voloshin) shows you how to use Star-CCM+ on the cluster.
3c97108c4214cce0c5d9bf0af2b2d3d885418309
355
354
2013-10-22T11:05:38Z
Jonnya
9
wikitext
text/x-wiki
Star-CCM+ is an engineering package which can be used to solve CDF problems.
This [http://{{SERVERNAME}}/docs/starccm.pdf guide ] (kindly written by Vitaly Voloshin) shows you how to use Star-CCM+ on the cluster.
efcf9d3d81baae209b4168d4b08dd1b99326777f
356
355
2013-10-22T11:15:36Z
Jonnya
9
wikitext
text/x-wiki
Star-CCM+ is an engineering package which can be used to solve CDF problems.
This [http://{{SERVERNAME}}/docs/starccm.pdf guide] (kindly written by Vitaly Voloshin) shows you how to use Star-CCM+ on the cluster.
The following files are those listed in the guide:
*[http://{{SERVERNAME}}/docs/queue_set.sh queue_set.sh]
*[http://{{SERVERNAME}}/docs/starccm_start.sh starccm_start.sh]
*[http://{{SERVERNAME}}/docs/run.java run.java]
*[http://{{SERVERNAME}}/docs/surf_mesh.java surf_mesh.java]
*[http://{{SERVERNAME}}/docs/sv_mesh.java sv_mesh.java]
*[http://{{SERVERNAME}}/vol_mesh.java vol_mesh.java]
9cc78ba096572c027430239b2a21be25723077f6
357
356
2013-10-22T11:16:26Z
Jonnya
9
wikitext
text/x-wiki
Star-CCM+ is an engineering package which can be used to solve CDF problems.
This [http://{{SERVERNAME}}/docs/starccm.pdf guide] (kindly written by Vitaly Voloshin) shows you how to use Star-CCM+ on the cluster.
The following files are those listed in the guide:
*[http://{{SERVERNAME}}/docs/queue_set.sh queue_set.sh]
*[http://{{SERVERNAME}}/docs/starccm_start.sh starccm_start.sh]
*[http://{{SERVERNAME}}/docs/run.java run.java]
*[http://{{SERVERNAME}}/docs/surf_mesh.java surf_mesh.java]
*[http://{{SERVERNAME}}/docs/sv_mesh.java sv_mesh.java]
*[http://{{SERVERNAME}}/docs/vol_mesh.java vol_mesh.java]
6cc3aa20d782dad77a172c9e37929da17415175a
359
357
2013-10-22T11:32:26Z
Jonnya
9
wikitext
text/x-wiki
Star-CCM+ is an engineering package which can be used to solve CFD problems.
This [http://{{SERVERNAME}}/docs/starccm.pdf guide] (kindly written by Vitaly Voloshin) shows you how to use Star-CCM+ on the cluster.
The following files are those listed in the guide:
*[http://{{SERVERNAME}}/docs/queue_set.sh queue_set.sh]
*[http://{{SERVERNAME}}/docs/starccm_start.sh starccm_start.sh]
*[http://{{SERVERNAME}}/docs/run.java run.java]
*[http://{{SERVERNAME}}/docs/surf_mesh.java surf_mesh.java]
*[http://{{SERVERNAME}}/docs/sv_mesh.java sv_mesh.java]
*[http://{{SERVERNAME}}/docs/vol_mesh.java vol_mesh.java]
ad61c31b781a2fe2598a551994351e21bbcc3c5c
Main Page
0
1
358
316
2013-10-22T11:18:31Z
Jonnya
9
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Sandbox]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the clsuter]]
526ec5d92b741db3bf73fcc4e95fa587ac950fd6
370
358
2014-02-11T21:31:42Z
Mjh
2
/* Using the cluster */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the clsuter]]
d7372ad6f1135bac84a617fbc3c94e4f892f4c9a
User:Dbab
2
51
362
2013-11-26T13:24:26Z
Mjh
2
Creating user page with biography of new user.
wikitext
text/x-wiki
da39a3ee5e6b4b0d3255bfef95601890afd80709
User talk:Dbab
3
52
363
2013-11-26T13:24:27Z
Mjh
2
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [[Help:Contents|help pages]].
Again, welcome and have fun! [[User:Mjh|Mjh]] 13:24, 26 November 2013 (GMT)
9a9fb75e32ecf12a9a75fd8a7dfb57058386fefc
Neuron
0
53
365
2013-11-26T17:55:13Z
Dbab
11
Created page with "neuron is installed in /soft/nrn to run neuron you should have the library path on path. To do so run <pre> setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib </pre> To make t..."
wikitext
text/x-wiki
neuron is installed in /soft/nrn
to run neuron you should have the library path on path. To do so run
<pre>
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib
</pre>
To make thi change permanent for new connections run
<pre>
echo 'setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib'>>~/.tcshrc
</pre>
after that you can run
<pre>
/soft/nrn/x86_64/bin/nrniv
/soft/nrn/x86_64/bin/nrngui
etc.
</pre>
2deab1369cd333677933b42a9dfd51196db08850
366
365
2013-11-26T17:55:46Z
Dbab
11
wikitext
text/x-wiki
neuron is installed in /soft/nrn
to run neuron you should have the library path on path. To do so run
<pre>
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib
</pre>
To make thi change permanent for new connections run
<pre>
echo 'setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib'>>~/.tcshrc
</pre>
Now you can run neuron using
<pre>
/soft/nrn/x86_64/bin/nrniv
/soft/nrn/x86_64/bin/nrngui
etc.
</pre>
de558cc8011e5817cbfacfd4fabca572876b745c
367
366
2013-11-26T17:58:55Z
Dbab
11
wikitext
text/x-wiki
neuron is installed in /soft/nrn
to run neuron you should have the library path on path. To do so run
<pre>
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib
</pre>
To make thi change permanent for new connections run
<pre>
echo 'setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib'>>~/.tcshrc
</pre>
Now you can run neuron using
<pre>
/soft/nrn/x86_64/bin/nrniv
/soft/nrn/x86_64/bin/nrngui
etc.
</pre>
To run experiments you need to run it through [[Jobs]] though.
b5f308ff5bef18993bbbbbad238a467c414b2b2b
368
367
2013-11-26T17:59:58Z
Dbab
11
wikitext
text/x-wiki
neuron is installed in /soft/nrn
to run neuron you should have the library path on path. To do so run
<pre>
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib
</pre>
To make thi change permanent for new connections run
<pre>
echo 'setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib'>>~/.tcshrc
</pre>
Now you can run neuron using
<pre>
/soft/nrn/x86_64/bin/nrniv
/soft/nrn/x86_64/bin/nrngui
etc.
</pre>
But don't run experiments directly. To do so you need to use [[Jobs]].
c8d1dbf6a8e87d422ca98171b6bc370c2db55570
369
368
2013-11-26T18:00:31Z
Dbab
11
wikitext
text/x-wiki
neuron is installed in /soft/nrn
to run neuron you should have the library path on path. Set it by using
<pre>
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib
</pre>
To make the change permanent for new connections run
<pre>
echo 'setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/soft/lib'>>~/.tcshrc
</pre>
Now you can run neuron using
<pre>
/soft/nrn/x86_64/bin/nrniv
/soft/nrn/x86_64/bin/nrngui
etc.
</pre>
But don't run experiments directly. To do so you need to use [[Jobs]].
f75e0340a31458f66b545c99ba56c7b41ab91a05
Vina
0
23
372
129
2014-02-20T15:09:45Z
Akukol
3
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file (conf.txt in the example below) into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinaScreen.bash > nohup.out &' (not qsub).
''Andreas''
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N $f -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
38bb375b527357c5f1565e83c1f1ccc42338636e
Autodock
0
22
373
149
2014-02-20T15:10:25Z
Akukol
3
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. 'akukol/bin/MGLtools'.
You need to download the file 'mgltools_x86_64Linux2_1.5.4.tar.gz'. The automatic installer does not work.
For virtual screening obtain the software [http://autodock.scripps.edu/resources/raccoon Raccoon]. This needs AutoDock tools to be installed first.
Raccoon automatically generates the file vs_submit.sh. This must be edited according to your special circumstances (see below) and then started with:
'nohup vs_submit.sh &' (do not use qsub)
''Andreas''
<pre>#!/bin/bash
#
# Generated with Raccoon | AutoDockVS
#
#### PBS jobs parametersCPUT="00:20:00"
WALLT="00:20:00" # << change here
#
# There should be no reason
# for changing the following values
NODES=1
PPN=1
MEM=512mb
### CUSTOM VARIABLES
#
# use the following line to set special options (e.g. specific queues)
#OPT="-q MyPriorQueue"
OPT="-j oe -N AutoDock" # join output and error, job name: Autodock
# Paths for executables on the cluster
# Modify them to specify custom executables to be used
QSUB="qsub" # << change here
AUTODOCK="/soft/autodock/autodock4" # << change here
# Special path to move into before running
# the screening. This is very system-specific,
# so unless you're know what are you doing,
# leave it as it is
WORKING_PATH=`pwd`
##################################################################################
##################################################################################
####### There should be no need to modify anything below this line ###############################
##################################################################################
##################################################################################
#
#
type $AUTODOCK &> /dev/null || {
echo -e "\nError: the file [$AUTODOCK] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the AutoDock binary in the script";
echo -e "( i.e. AUTODOCK=/usr/bin/autodock4 )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
type $QSUB &> /dev/null || {
echo -e "\nError: the file [$QSUB] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the Qsub command binary in the script";
echo -e "( i.e. QSUB=/usr/bin/qsub )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
echo Starting submission...
for NAME in `cat jobs_list`
do
cd $NAME
echo "#!/bin/bash" > $NAME.job
echo "cd $WORKING_PATH/$NAME" >> $NAME.job
echo "$AUTODOCK -p $NAME.dpf -l $NAME.dlg" >> $NAME.job
chmod +x $NAME.job
echo -n "Submitting $NAME : "
$QSUB $OPT -l cput=$CPUT -l nodes=1:ppn=1 -l walltime=$WALLT -l mem=$MEM $NAME.job
sleep 23 # << add this line to avoid flooding the cluster with 1000nds of jobs
cd ..
done
</pre>
The wait time of 23 seconds may be reduced in order to speed up the calculation.
4c97eaf011bc960cc4dd4a4a4e73907f52f7814a
IGemDock
0
21
374
133
2014-02-20T15:11:21Z
Akukol
3
wikitext
text/x-wiki
IGemDock is a user interface to [http://gemdock.life.nctu.edu.tw/dock/ Gemdock] for molecular docking.
First you need to execute
'export LD_LIBRARY_PATH=/soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH'
(put it in your .cshrc, so it is set automatically when you log in)
Start with '/soft/iGEMDOCKv2.1-centos/bin/iGemdock'
The molecular docking engine is '/soft/iGEMDOCKv2.1-centos/bin/mod_ga'
Gemdock runs one process only (on one CPU core).
That is the script RunGemdock.sh you need (remember to make RunGemdock.sh executable):
<pre>#!/bin/sh
#PBS -N GemD_comt2
#PBS -q main
#PBS -l nodes=1:ppn=1
#PBS -j oe
#PBS -u akukol
#PBS -l walltime=250:00:00
export LD_LIBRARY_PATH /soft/iGEMDOCKv2.1-centos/bin:$LD_LIBRARY_PATH
cd /home/akukol/data/vscreenTest/comt2_gemdock
### This is the command ###
/usr/local/bin/mpiexec /soft/iGEMDOCKv2.1-centos/bin/mod_ga -f docking.dock
### command end ###
# start with 'qsub RunGemdock.sh'
</pre>
''Andreas''
f2f590a9e6b13a7a31c7b7055804eccc529a596e
Gromacs
0
19
375
330
2014-02-20T15:12:12Z
Akukol
3
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use the same version of Gromacs.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the number of nodes to use (#PBS -l, each node has 8 CPU cores), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Note that the default on the 'main' queue is 24 hours. If you don't specify any [[Queues|walltime]], the job will stop after 24 hours.
Look here for [[groperform|optimising performance]].
''Andreas''
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=02:00:00
#PBS -j oe
#PBS -k oe
#PBS -u akukol
# runs a job with name 'GromacsTest' on the 'main' cluster
# uses 1 node and 8 CPUs (each nodes has 8 CPUs)
# set a maximum time of two hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'akukol'
# set required paths:
source /soft/gromacs-new/bin/GMXRC
# used previously: export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
export LD_LIBRARY_PATH='/usr/mpi/gcc/mvapich2-1.6/lib:$LD_LIBRARY_PATH'
# specify working directory:
cd /home/akukol/groTest
### This is the command ###
/usr/local/bin/mpiexec mdrun -s md_test.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
32de01bf56a812e6390aca5705d08fadd2f0bb2e
Storage
0
8
376
342
2014-02-25T16:13:59Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster is set up as follows:
* 1 Tb of user home directories, mounted as /home
* 61 Tb of scratch available to all users, mounted as /stri-data
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 59 Tb of scratch for CAIR users only, mounted as /cair-data
* 167 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
There is also an area for software to be shared over the network, mounted as /soft. These paths should work for both the head node and the compute nodes.
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work in /home, use a symbolic link). A [[quota]] system is in place on /home.
Please use the scratch space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
The [[SMP machines]] have their own local 10-Tb scratch discs: see the relevant page for more information.
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
No data area on the cluster is currently backed up. You must take responsibility for your own backups. In the longer term we expect to be able to make regular backups of /home and possibly some of /stri-data .
e31315b8107b917743de187f340235ea3e8e8a08
Ramdisks
0
54
377
2014-02-25T16:16:32Z
Mjh
2
Created page with "All nodes have a 16-Gb ramdisk set up by default. The normal expectation is that you will use the central [[storage]] for file operations, including data processing. However, th..."
wikitext
text/x-wiki
All nodes have a 16-Gb ramdisk set up by default.
The normal expectation is that you will use the central [[storage]] for file operations, including data processing. However, there may be circumstances where it will be useful to do local, fast file operations.
If you want to do this, to avoid interfering with other jobs:
* You ''must'' reserve the maximum amount of space that your job will use using the <tt>pmem</tt> option to <tt>qsub</tt>; e.g.
<pre>qsub -l nodes=1,pmem=10gb</pre>
* You must create a directory in /ramdisk in which your job will work as part of your <tt>qsub</tt> script, which will be unique to your job. For example, you might want to do
<pre>
mkdir /ramdisk/$PBS_JOBID
cd /ramdisk/$PBS_JOBID
</pre>
* You must only work in this directory, and the total filespace you use must not exceed the reserved amount.
* When your job is finished it must before exiting clear up the filespace used; no files must be left in <tt>/ramdisk</tt>.
Note that /ramdisk is by nature volatile. When a machine is rebooted, the contents of /ramdisk will be irretrievably lost.
825e15d8e3de2ab9e6d4281f9a93af1203dc9abc
378
377
2014-02-25T16:18:52Z
Mjh
2
wikitext
text/x-wiki
All nodes have a 16-Gb ramdisk set up by default.
The normal expectation is that you will use the central [[storage]] for file operations, including data processing. However, there may be circumstances where it will be useful to do local, fast file operations.
If you want to do this, to avoid interfering with other jobs:
* You ''must'' reserve the maximum amount of space that your job will use using the <tt>pmem</tt> option to <tt>qsub</tt>; e.g.
<pre>qsub -l nodes=1,pmem=10gb</pre>
* You must create a directory in /ramdisk in which your job will work as part of your <tt>qsub</tt> script, which will be unique to your job. For example, you might want to do
<pre>
mkdir /ramdisk/$PBS_JOBID
cd /ramdisk/$PBS_JOBID
</pre>
* You must only work in this directory, and the total filespace you use must not exceed the reserved amount.
* When your job is finished it must before exiting clear up the filespace used; no files must be left in <tt>/ramdisk</tt>.
Note that /ramdisk is by nature volatile. When a machine is rebooted, the contents of /ramdisk will be irretrievably lost.
If you want larger, non-volatile local storage, see [[local disk space]].
15035b4308071b478e6d293ac34932775f5f94bc
LOFAR
0
47
379
349
2014-05-06T15:55:11Z
Mjh
2
wikitext
text/x-wiki
To run LOFAR software, do
<pre>
module load lofar
source /soft/lofar/lofarinit.csh
</pre>
You will need a <tt>.casarc</tt> file, something like this should do:
<pre>
measures.DE200.directory: /soft/casapy/data/ephemerides
measures.DE405.directory: /soft/casapy/data/ephemerides
measures.line.directory: /soft/casapy/data/ephemerides
measures.sources.directory: /soft/casapy/data/ephemerides
measures.comet.directory: /soft/casapy/data/ephemerides
measures.ierseop97.directory: /soft/casapy/data/geodetic
measures.ierspredict.directory: /soft/casapy/data/geodetic
measures.tai_utc.directory: /soft/casapy/data/geodetic
measures.igrf.directory: /soft/casapy/data/geodetic
measures.observatory.directory: /soft/casapy/data/geodetic
</pre>
The version of the LOFAR software in /soft/lofar is an old, stable one. For more up-to-date versions look for /soft/lofar-date where date is a numeric build date. Then
source /soft/lofar-date/lofarinit.csh instead.
For newer LOFAR versions you may want to make sure that pyrap is on your path, so a full setup might be
<pre>
module load casa
module load lofar
source /soft/lofar-060514/lofarinit.csh
setenv PYTHONPATH /soft/pyrap:$PYTHONPATH
</pre>
1b05138ee613aeafa8107b4ccccbb522f813db09
385
379
2015-01-13T15:44:11Z
Mjh
2
wikitext
text/x-wiki
To run LOFAR software, do
<pre>
module load lofar
source /soft/lofar/lofarinit.csh
</pre>
You will need a <tt>.casarc</tt> file, something like this should do:
<pre>
measures.DE200.directory: /soft/casapy/data/ephemerides
measures.DE405.directory: /soft/casapy/data/ephemerides
measures.line.directory: /soft/casapy/data/ephemerides
measures.sources.directory: /soft/casapy/data/ephemerides
measures.comet.directory: /soft/casapy/data/ephemerides
measures.ierseop97.directory: /soft/casapy/data/geodetic
measures.ierspredict.directory: /soft/casapy/data/geodetic
measures.tai_utc.directory: /soft/casapy/data/geodetic
measures.igrf.directory: /soft/casapy/data/geodetic
measures.observatory.directory: /soft/casapy/data/geodetic
</pre>
The version of the LOFAR software in /soft/lofar is an old, stable one. For more up-to-date versions look for /soft/lofar-date where date is a numeric build date. Then
source /soft/lofar-date/lofarinit.csh instead.
For newer LOFAR versions you may want to make sure that pyrap is on your path, so a full setup might be
<pre>
module load casa
module load lofar
source /soft/lofar-091114/lofarinit.csh
setenv PYTHONPATH /soft/pyrap:$PYTHONPATH
</pre>
0bf590861d55baf9d47c5caf237e31dc6ababad3
Queues
0
15
380
345
2014-06-06T10:09:31Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available for general use on the system:
* 'main' is the default queue: this submits to the 56 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cmain' submits to CAR or main-cluster nodes. This queue is restricted to CAR users. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 46 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on all queues is 1.
746d0c749b62ca2cf79cce01c7268d190083699b
Software
0
17
381
364
2014-07-01T10:03:36Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.5.5 installed in <tt>/soft/gromacs-new</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/matlab/R2010a/bin</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar</tt>
* <u>[[Python packages]]</u>: set your PYTHONPATH to include <tt>/soft/python/lib64/python2.7/site-packages</tt>
* <u>[[neuron]]</u>: in <tt> /soft/nrn</tt>
* <u>[[Miriad]]</u>: in <tt> /soft/miriad</tt>
6444bf7d0e3ddcdbdd1cee4e891b0d5d20d075d1
405
381
2015-11-26T10:47:38Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.5.5 installed in <tt>/soft/gromacs-new</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/MATLAB/R2013b/bin/Matlab</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar</tt>
* <u>[[Python packages]]</u>: set your PYTHONPATH to include <tt>/soft/python/lib64/python2.7/site-packages</tt>
* <u>[[neuron]]</u>: in <tt> /soft/nrn</tt>
* <u>[[Miriad]]</u>: in <tt> /soft/miriad</tt>
9bedebc5b7e7d63c95cfbceb58eea05ea311efaf
Miriad
0
55
382
2014-07-01T10:04:06Z
Mjh
2
Created page with "To access the ATNF Miriad software do <tt>module load miriad</tt>."
wikitext
text/x-wiki
To access the ATNF Miriad software do <tt>module load miriad</tt>.
d64933ad085e4c96a7a521b3120f9ee1bcb76546
Jobs
0
9
383
340
2014-07-10T14:57:43Z
Jonnya
9
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://docs.adaptivecomputing.com/torque/help.htm Torque] (formerly known as PBS) and [http://docs.adaptivecomputing.com/maui/index.php Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/qsub.htm online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
* <tt>file</tt>: the [[local disk space]] to be used by your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qualter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/2-jobs/exportedBatchEnvVar.htm here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/pbsdsh.htm pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
b528d5cd92ae43036b16ac5d95c96b42d8c0f306
Cluster bibliography
0
30
384
282
2014-09-19T14:27:00Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Kukol A, Hughes DJ, Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, '''2014''', ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B, How the amyloid-β peptide and membranes affect each other: An extensive simulation study, '''2013''', ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A, Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', '''2011''', 176(2), 133-142.
* Kukol A, Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', '''2011''', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
42d5895e85e422cdf2c1c458916294419e6176df
Memory
0
36
386
319
2015-01-16T12:38:54Z
Mjh
2
wikitext
text/x-wiki
Jobs that will use a large amount of physical memory must make sure that they request this in the job submission command or script.
As described in the section on [[architecture]], the nodes of the main cluster have a range of different physical memory sizes. If the total amount of memory used by all jobs running on a given node is larger than, or even starts to approach this value, than performance will drop, jobs may be unable to allocate memory and will eventually be killed, and the node may even crash (as the Linux out-of-memory killer appears often to leave a node in an unstable state).
To make sure that this doesn't happen to your job (or, worse, your job causes it to happen to someone else's) you should specify the amount of physical memory used per process, if it is more than 900 Mb, the default, by using the <tt>pmem</tt> attribute in the job control system. So, for example, if you need 8 Gb of memory per process for 8 processes, an example job submission script would look like this:
<pre>
#!/bin/sh -f
#PBS -N large-job
#PBS -m abe
#PBS -l nodes=8
#PBS -l walltime=00:01:00
#PBS -l pmem=8gb
#PBS -k oe
... job commands go here ...
</pre>
This job will attempt to run 8 separate processes, but the scheduling system will not (as it would by default) place them all on the same node, since it knows that the processes would require more memory than is available. They will be placed on separate nodes where their memory requirements will not interfere with each other.
It's the user's responsibility to figure out a sensible memory request for large jobs. Err on the side of generosity, but bear in mind that asking for too much memory will make the scheduling of both your and other users' jobs inefficient. Please bear in mind that the typical cluster job runs very comfortably in 1 Gb. You can see how much physical memory a running job is using by doing <tt>qstat -f <jobid></tt>: the line <tt>resources_used.mem</tt> tells you the total memory use for all processes.
Users needing very large amounts of memory should consider using the [[SMP machines]], which have 256 Gb of physical memory. (If doing this, it is still sensible to use the <tt>pmem</tt> job attribute to make sure that you will not interfere with other users.)
2e74eefd25801048924f81dfc831fd60db1e50ed
Architecture
0
7
387
371
2015-01-16T12:40:46Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 Gb RAM for user logins and development
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 Gb RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 Gb RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 Gb RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 16 Xeons (E5-2660s) 2 socket x 8-core with 16, 32 or 64 Gb RAM and FDR Infiniband form the rest of the main cluster and some dedicated CAIR nodes (chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 Gb RAM and QDR Infiniband (smp1-3).
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 345 Tb of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 Gb RAM ('''CAIR use only''').
** 77 Tb of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A separate server providing an additional 12 Tb of storage for CAIR use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
e9677324e2134df529402726ae0ab74217f71626
397
387
2015-07-26T08:47:17Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 GB RAM for user logins and development
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-3).
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 Gb RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A separate server -- [[cair-forecast]] providing an additional 80 Tb of storage and processing for CAIR AQF use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
2897271c01dce4b9527ab3bedd8c4c376cb26dab
398
397
2015-07-26T08:47:52Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 GB RAM for user logins and development
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-3).
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 GB RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
e0d7964f0d9c8481734da524bae665ebf84b62f8
Todo
0
56
388
2015-04-08T09:56:18Z
Jonnya
9
Created page with "Upgrade checklist"
wikitext
text/x-wiki
Upgrade checklist
aefc802f499ae3cb2da3a80fe892e9d17b265030
389
388
2015-04-08T10:02:46Z
Jonnya
9
wikitext
text/x-wiki
Upgrade checklist (in no particular order);
*Install new CA and generate certs;
*Install slapd and import ldif.
*setup denyhosts;
*setup exim;
*setup httpd;
*setup munin;
*setup ganglia;
*setup torque;
*setup maui;
*setup iptables;
*setup routing;
*setup nfs shares;
*copy /ewtc/fstab entries;
abddae616406cf6f863f48f90a9c3222834d84ea
390
389
2015-04-08T10:06:43Z
Jonnya
9
wikitext
text/x-wiki
Upgrade checklist (in no particular order);
*Install new CA and generate certs;
*Install slapd and import ldif.
*setup denyhosts;
*setup exim;
*setup httpd and copy existng web hierarchy;
*setup munin;
*setup ganglia;
*setup torque;
*setup maui;
*setup iptables;
*setup routing;
*setup nfs shares;
*copy /etc/fstab entries;
*copy /etc/rc.d/rc.local (license managers etc)
*copy existng system cron jobs (home backup)
*copy /root (re: scripts )
4e6a815cbbde894b158b02b315897d19d62d5265
391
390
2015-04-08T10:08:57Z
Jonnya
9
wikitext
text/x-wiki
Upgrade checklist (in no particular order);
*test existing RAID array with cair-cluster;
*install new CA and generate certs;
*install slapd and import ldif.
*setup denyhosts;
*setup exim;
*setup httpd and copy existng web hierarchy;
*setup munin;
*setup ganglia;
*setup torque;
*setup maui;
*setup iptables;
*setup routing;
*setup nfs shares;
*copy /etc/fstab entries;
*copy /etc/rc.d/rc.local (license managers etc)
*copy existng system cron jobs (home backup)
*copy /root (re: scripts )
f373cde144710e12e9a61f57c62baf46c69da692
392
391
2015-04-08T10:22:44Z
Mjh
2
wikitext
text/x-wiki
Upgrade checklist (in no particular order);
*test existing RAID array with cair-cluster;
*install new CA and generate certs;
*install slapd and import ldif.
*setup denyhosts;
*setup exim;
*setup httpd and copy existng web hierarchy;
*setup wiki
*setup munin;
*setup ganglia;
*setup torque (copy from old including spool dirs for jobs);
*setup maui (copy from old);
*setup iptables;
*setup routing;
*setup nfs shares;
*setup ntp;
*copy /etc/fstab entries;
*copy /etc/rc.d/rc.local (license managers etc)
*copy existng system cron jobs (home backup)
*copy /root (re: scripts )
d56ee06ac14b2e331980255fbe1615713d73f67e
393
392
2015-04-08T16:40:21Z
Jonnya
9
wikitext
text/x-wiki
Upgrade checklist (in no particular order);
*test existing RAID array with cair-cluster;
*install new CA and generate certs;
*install slapd and import ldif.
*setup denyhosts;
*setup exim;
*setup httpd and copy existng web hierarchy;
*setup wiki
*setup munin;
*setup ganglia;
*setup torque (copy from old including spool dirs for jobs);
*setup maui (copy from old);
*setup iptables;
*setup routing;
*setup nfs shares;
*setup ntp;
*copy /etc/fstab entries;
*copy /etc/rc.d/rc.local (license managers etc)
*copy existng system cron jobs (home backup)
*copy /root (re: scripts )
*rkhunter/tripwire?
863ef89a5ce0089cfe2d33350a1fc39a0e748c1d
394
393
2015-04-09T08:46:11Z
Mjh
2
wikitext
text/x-wiki
Upgrade checklist (in no particular order);
*test existing RAID array with cair-cluster;
*install new CA and generate certs;
*install slapd and import ldif.
*setup denyhosts;
*setup dnsmasq
*setup exim;
*setup httpd and copy existng web hierarchy;
*setup wiki
*setup munin;
*setup ganglia;
*setup torque (copy from old including spool dirs for jobs);
*setup maui (copy from old);
*setup iptables;
*setup routing;
*setup nfs shares;
*setup ntp;
*copy /etc/fstab entries;
*copy /etc/rc.d/rc.local (license managers etc)
*copy existng system cron jobs (home backup)
*copy /root (re: scripts )
*rkhunter/tripwire?
5959932ec9318f266687d78399a584331f628931
395
394
2015-04-09T08:48:17Z
Mjh
2
wikitext
text/x-wiki
Upgrade checklist (in no particular order);
*test existing RAID array with cair-cluster;
*install new CA and generate certs;
*install slapd and import ldif.
*setup denyhosts;
*setup dnsmasq
*setup exim;
*setup mysql or equivalent
*setup httpd and copy existng web hierarchy;
*setup wiki (in mysql??)
*setup munin;
*setup ganglia;
*setup torque (copy from old including spool dirs for jobs);
*setup maui (copy from old);
*setup iptables;
*setup routing;
*setup nfs shares;
*setup ntp;
*copy /etc/fstab entries;
*copy /etc/rc.d/rc.local (license managers etc)
*copy existng system cron jobs (home backup)
*copy /root (re: scripts )
*rkhunter/tripwire?
3b95cb4c0f19091535d1b860332d06b533c68ce3
Passwordless ssh
0
13
396
71
2015-06-19T07:22:50Z
Mjh
2
wikitext
text/x-wiki
For some applications (including use of the [[jobs|job submission system]]) you will need to enable passwordless ssh between nodes.
The simplest way of doing this is as follows:
* run <tt>ssh-keygen</tt> and generate a key *with no passphrase* (just press return when prompted). Using a passphrase will not work!
* cd into your <tt>~/.ssh</tt> directory.
* <tt>cat id_rsa.pub >> authorized_keys</tt>
* Passwordless ssh is now set up: try e.g. <tt>ssh node001 hostname</tt> to test it.
Note that you are ''not'' permitted to use this to run jobs on the nodes: see [[Policies]] for more.
830c17fd7fd78c695d9139b40a5d31027a5be99f
Main Page
0
1
399
370
2015-10-25T11:37:29Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the clsuter]]
e0baac66f0ef7637633c5d610a688df91c2ae999
408
399
2015-12-19T09:42:50Z
Mjh
2
/* How-Tos */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
9d179d5efcdc888dd57993fbb9ff2642cc939fe2
409
408
2015-12-22T13:58:23Z
Mjh
2
/* Troubleshooting */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
b7bf3bad45b01ee9553525d98678791590c1f278
LOFAR-UK Compute Facility
0
57
400
2015-10-25T11:53:20Z
Mjh
2
Created page with "The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, p..."
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes with 64 GB RAM and 16 cores.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to lofar.herts.ac.uk. Data can be downloaded to the dedicated area /data/lofar/.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released.
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive jobs, as appropriate.
See the [[LOFAR]] page for infomration on LOFAR software.
0d6f37bc2d9dca59c076e5385feb6669c1311abb
401
400
2015-10-25T12:08:05Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes with 64 GB RAM and 16 cores.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to lofar.herts.ac.uk. Data can be downloaded to the dedicated area /data/lofar/.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local users need to submit jobs with the option -W group_list=lofar to make use of the reservation.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive jobs, as appropriate.
See the [[LOFAR]] page for infomration on LOFAR software.
6f10e01fb6ee603f153a93ec842d09c190c7e120
402
401
2015-10-26T11:59:29Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes with 64 GB RAM and 16 cores.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive jobs, as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
1b8aa1d49d7ace2bbee907590475fa1ea34b61fe
403
402
2015-10-26T12:00:04Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes with 64 GB RAM and 16 cores.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive jobs, as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
857c7df7fbe313feb5beb56209501e446923f89f
413
403
2016-01-21T10:56:59Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes with 64 GB RAM and 16 cores.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive jobs, as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[Herts LOFAR HBA pipeline]] is available.
7b4715482b89430b800c1bbc83ec677de5561d61
420
413
2016-02-05T12:48:36Z
Wwilliams
13
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes with 64 GB RAM and 16 cores.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive jobs, as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[Herts LOFAR HBA pipeline]] is available.
A description of the [[running the generic pipeline]] is available.
0f892725dcf6421d126449460c4385d48f75c0dd
422
420
2016-02-05T17:25:37Z
Wwilliams
13
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes with 64 GB RAM and 16 cores.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive jobs, as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[Herts LOFAR HBA pipeline]] is available.
A description of the [[generic pipeline]] is available.
9f7e6390cfe62a310f725409785a2ad77b261459
Administrators
0
6
404
305
2015-11-17T16:57:33Z
Mjh
2
/* Administrators */
wikitext
text/x-wiki
== Administrators ==
These are currently:
* Leigh Smith, l.smith10@herts.ac.uk (x3358, room E117C)
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk (x3409, room 2E71 (Innovation Centre)).
Contact us with queries. Basic support queries (e.g. account requests, difficulty logging on or using software) should be directed to Leigh in the first instance.
0cd4652683a164ca3367be9b90088adc42c2b9e5
User:Asinha
2
58
406
2015-12-18T13:07:39Z
Mjh
2
Creating user page for new user.
wikitext
text/x-wiki
I'm still working on getting my doctorate in Computer Science at the University of Hertfordshire. I work in computational neuroscience and my interests include plasticity - both structural and synaptic, associative memory, recurrent spiking neural networks, bio-mimetic robotics and so on. There are quite a few other topics I muse about but I generally haven't the time to actually research them at the moment.
I am currently a PhD candidate at the Biocomputation laboratory at the University of Hertfordshire. I study the capacity of associative memory in networks and the effect that plasticity has on it.
7d8a41077f4f2f8dd9b9f57459196f9d3b7c1ba2
User talk:Asinha
3
59
407
2015-12-18T13:07:39Z
Mjh
2
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents help pages].
Again, welcome and have fun! [[User:Mjh|Mjh]] ([[User talk:Mjh|talk]]) 13:07, 18 December 2015 (UTC)
9c73e588bdeab4962b25e0a0b626e002e52517ce
Job errors
0
60
410
2015-12-22T14:03:04Z
Mjh
2
Created page with "If you have requested that the job control system e-mail you on error, the most common error that you will see looks like this: <pre> Subject: PBS JOB 12345.stri-cluster.hert..."
wikitext
text/x-wiki
If you have requested that the job control system e-mail you on error, the most common error that you will see looks like this:
<pre>
Subject: PBS JOB 12345.stri-cluster.herts.ac.uk
From: adm@stri-cluster.herts.ac.uk
To: user@stri-cluster.herts.ac.uk
PBS Job Id: 12345.stri-cluster.herts.ac.uk
Job Name: test.qsub
Exec host: node033/4
An error has occurred processing your job, see below.
Post job file processing error; job 12345.stri-cluster.herts.ac.uk on host
node033/4
Unable to copy file
/var/spool/torque/spool/12345.stri-cluster.herts.ac.uk.OU to
user@stri-cluster.herts.ac.uk:/home/user/test.qsub.o12345
*** error from copy
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
lost connection
*** end error output
Output retained on that host in:
/var/spool/torque/undelivered/12345.stri-cluster.herts.ac.uk.OU
</pre>
This is a result of not enabling [[passwordless ssh]]. The job control system tries to copy the job output using ssh and fails to do so. Please make sure you have enabled passwordless ssh before you run any jobs.
95eefa08da2a52927bd364b0b78cee01447aece5
Mail
0
18
411
303
2015-12-22T14:05:36Z
Mjh
2
wikitext
text/x-wiki
Various systems on the cluster will want to send you e-mail. By default this will be forwarded to the e-mail address you supplied on account creation.
If you wish to change where the e-mail goes, you should modify the <tt>.forward</tt> file in your home directory: this is a plain text file containing one or more e-mail addresses. Under no circumstances should you remove the file and allow e-mail to remain on the cluster.
fb4aeef43b52ba6747c1aa1d4f9bbd8c567638ba
Known problems
0
25
412
352
2015-12-31T09:59:43Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* The scheduler sometimes crashes for unknown reasons causing jobs not to run. (Regularly run scripts check and restart the scheduler.)
* The scheduler very occasionally will not run a job that could be run immediately in free resources. If <tt>checkjob</tt> shows 'job can run' and <tt>showstart</tt> shows an immediate start, but your job does not run, then please contact the [[administrators]].
* Node specifications of the form <tt>nodes=main:ppn=16</tt> or <tt>nodes=smp:ppn=1</tt> will severely confuse the scheduler, although they are valid. Please do not use queue names in node specifications: always do something like <tt>-q main -l nodes=1:ppn=16</tt> instead.
d525a00836a05f6c18b72819cde7fa154fb6c504
Herts LOFAR HBA pipeline
0
61
414
2016-01-21T11:18:15Z
Mjh
2
Created page with "The Herts HBA pipeline is a set of 'pre-facet' processing steps for HBA data that are adapted to running on the Herts cluster using the [[jobs|job control system]]. All the s..."
wikitext
text/x-wiki
The Herts HBA pipeline is a set of 'pre-facet' processing steps for HBA data that are adapted to running on the Herts cluster using the [[jobs|job control system]].
All the steps in the pipeline process are controlled by a single parameter set in a simple hierarchical format. An example parameter set is shown below.
<pre>
[paths]
unpack=/car-data/mjh/lofar/new
processed=/smp3/mjh/lofar/nw-facet
work=/local/mjh
[files]
calibrator=L221264
target=L221266
[calibration]
flagintbaselines=True
skymodel=/home/mjh/lofar/surveys-pipeline/3C196PANDEYJUL14.skymodel
fitcra=True
notransfer=True
[preflag]
sbrange=0,121
antenna=CS103HBA0
[control]
dryrun=False
beam_applied=False
</pre>
All the scripts are located in <tt>/home/mjh/lofar/surveys-pipeline</tt>.
You should begin by creating your own version of the config file shown here. In what follows <tt>username</tt> represents your login ID on the system. In <tt>[paths]</tt>, <tt>unpack</tt> is the directory containing the tar files for the unprocessed target and calibrator fields: <tt>processed</tt> is the directory where processed data will be stored, normally <tt>/data/lofar/username/...</tt>: <tt>work</tt> should be a local working directory on the nodes, normally <tt>/local/username</tt>.
<tt>[files]</tt> should give the prefixes (LOFAR IDs) for target and calibrator.
The entries in <tt>[calibration]</tt> are very important. For pre-facet calibration we want <tt>fitcra=True</tt>, <tt>notransfer=True</tt>. If you have a sky model for your calibrator it should be specified in <tt>skymodel</tt>. Any other pre-calibration activity, such as flagging the international baselines, needs to be specified here. Valid options include:
* <tt>antennafix</tt>: default False, run fixbeaminfo
* <tt>antennafix15</tt>: default False, run 2015 fixbeaminfo
* <tt>flagbadweight</tt>: default False, flag WEIGHT_SPECTRUM>1, needed for some Cycle 2 data
* <tt>rficonsole</tt>: default True, run rficonsole
* <tt>skipexisting</tt>: default False, do not run if output data already exist
<tt>[preflag]</tt> should initially be left empty.
84a6d6b8f3d4e30de29831ca09923273956fd221
415
414
2016-01-21T14:42:34Z
Mjh
2
wikitext
text/x-wiki
The Herts HBA pipeline is a set of 'pre-facet' processing steps for HBA data that are adapted to running on the Herts cluster using the [[jobs|job control system]].
All the steps in the pipeline process are controlled by a single parameter set in a simple hierarchical format. An example parameter set is shown below.
<pre>
[paths]
unpack=/car-data/mjh/lofar/new
processed=/smp3/mjh/lofar/nw-facet
work=/local/mjh
[files]
calibrator=L221264
target=L221266
[calibration]
flagintbaselines=True
skymodel=/home/mjh/lofar/surveys-pipeline/3C196PANDEYJUL14.skymodel
fitcra=True
notransfer=True
[preflag]
sbrange=0,121
antenna=CS103HBA0
[control]
dryrun=False
</pre>
All the scripts are located in <tt>/home/mjh/lofar/surveys-pipeline</tt>. This description assumes that you are logged in to the LOFAR-UK head node.
== Step 1: ==
You should begin by creating your own version of the config file shown here. In what follows <tt>username</tt> represents your login ID on the system. In <tt>[paths]</tt>, <tt>unpack</tt> is the directory containing the tar files for the unprocessed target and calibrator fields: <tt>processed</tt> is the directory where processed data will be stored, normally <tt>/data/lofar/username/...</tt>: <tt>work</tt> should be a local working directory on the nodes, normally <tt>/local/username</tt>.
<tt>[files]</tt> should give the prefixes (LOFAR IDs) for target and calibrator.
The entries in <tt>[calibration]</tt> are very important. For pre-facet calibration we want <tt>fitcra=True</tt>, <tt>notransfer=True</tt>. If you have a sky model for your calibrator it should be specified in <tt>skymodel</tt> — this is important for calibrators that are resolved on the long baselines like 3C196 or 3C295. Any other pre-calibration activity, such as flagging the international baselines, needs to be specified here. Valid options include:
* <tt>antennafix</tt>: default False, run fixbeaminfo
* <tt>antennafix15</tt>: default False, run 2015 fixbeaminfo
* <tt>flagbadweight</tt>: default False, flag WEIGHT_SPECTRUM>1, needed for some Cycle 2 data
* <tt>rficonsole</tt>: default True, run rficonsole
* <tt>skipexisting</tt>: default False, do not run if output data already exist
<tt>[preflag]</tt> should initially be left empty.
<tt>[control]</tt> is used for general control options -- setting <tt>dryrun=True</tt> will mean that commands to be executed by the scripts are not run but only printed (useful for debugging purposes).
== Step 2: ==
The next step is to calibrate the calibrator. Test the process as follows:
<pre>
(LOFAR setup)
/home/mjh/lofar/surveys-pipeline/calib.py config.cfg subband-number
</pre>
LOFAR setup is as described on the [[LOFAR]] page; config.cfg should be the full path to your config file; sub-band number should be a reliable sub-band, say 200. If all is well, this will take a few minutes and will create a copy of the calibrator and target data in your <tt>processed</tt> path. Feel free to inspect the <tt>CORRECTED_DATA</tt> for the calibrator and the amp/phase solutions in the instrument table.
If this single step works, you can proceed to calibrate all the data. Exit the interactive session and do
<pre>
qsub -t 0-365 -v CONFIG=/full/path/to/config.cfg /home/mjh/lofar/surveys-pipeline/run-calib.qsub
</pre>
This runs as many jobs as possible in parallel, so initially you will submit 366 separate jobs to the queue. Use <tt>qstat</tt> to check the progress of the jobs as they pass from queued (Q) to running (R) to completed (C). Each individual job takes only a few minutes. Jobs that complete immediately are a sign of problems. Check the output from these jobs, which will accumulate in your home directory. When everything is completed, check that all data have been written to the processed directory as expected.
=== Step 3: ===
The next steps set things up for Clock-TEC separation. Make sure the LOFAR scripts are on your path as usual, then
<pre>
setenv PYTHONPATH /home/mjh/git:/home/mjh/reinout-scripts_v3:$PYTHONPATH
/home/mjh/lofar/surveys-pipeline/clocktec-prep.py file.cfg
/home/mjh/lofar/surveys-pipeline/amplitudes_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/find_bad_subband.py cal.h5
</pre>
The <tt>amplitudes_losoto</tt> script will generate some matrices of amplitude solutions vs time and baseline in the working directory. <tt>find_bad_subband</tt> searches these for outliers. You should verify the sub-bands you want to exclude by looking at the matrices, then add a <tt>badsblist</tt> line to the calibration section of your config file, e.g.
<pre>
badsblist=[267, 302, 304, 305, 306, 307, 308, 309, 310, 311]
</pre>
If you find bad *antennas* at this point — or antennas that are bad on many baselines — it is best to put them in <tt>preflag</tt> in the config file and redo the calibration from the end of step 2 (i.e. delete everything in your processed directory and redo the 366-band qsub).
Then fit for Clock-TEC:
<pre>
/home/mjh/lofar/surveys-pipeline/amplitudes_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/fit_clocktec_initialguess_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/examine_npys.py file.cfg
</pre>
Again, look at the plots generated by these scripts. If all looks sensible (core stations have low clock offsets and clock is largely constant per antenna) then continue:
<pre>
/home/mjh/lofar/surveys-pipeline/find_cal_global_phaseoffset.py file.cfg
/home/mjh/lofar/surveys-pipeline/make_template_parmdb.py file.cfg
</pre>
You may now leave the interactive session and apply the solutions to the target:
<pre>
qsub -t 0-366 -q main -W group_list=lofar /home/mjh/lofar/surveys-pipeline/apply-clocktec.qsub -v CONFIG=file.cfg
</pre>
== Step 4: ==
You now need to combine these individual datasets and prepare for facet calibration. TBD...
b8e5d6aae870b9eb404c4ba6c5ccb1393ff2702f
416
415
2016-01-21T14:42:55Z
Mjh
2
wikitext
text/x-wiki
The Herts HBA pipeline is a set of 'pre-facet' processing steps for HBA data that are adapted to running on the Herts cluster using the [[jobs|job control system]].
All the steps in the pipeline process are controlled by a single parameter set in a simple hierarchical format. An example parameter set covering the initial steps is shown below.
<pre>
[paths]
unpack=/car-data/mjh/lofar/new
processed=/smp3/mjh/lofar/nw-facet
work=/local/mjh
[files]
calibrator=L221264
target=L221266
[calibration]
flagintbaselines=True
skymodel=/home/mjh/lofar/surveys-pipeline/3C196PANDEYJUL14.skymodel
fitcra=True
notransfer=True
[preflag]
sbrange=0,121
antenna=CS103HBA0
[control]
dryrun=False
</pre>
All the scripts are located in <tt>/home/mjh/lofar/surveys-pipeline</tt>. This description assumes that you are logged in to the LOFAR-UK head node.
== Step 1: ==
You should begin by creating your own version of the config file shown here. In what follows <tt>username</tt> represents your login ID on the system. In <tt>[paths]</tt>, <tt>unpack</tt> is the directory containing the tar files for the unprocessed target and calibrator fields: <tt>processed</tt> is the directory where processed data will be stored, normally <tt>/data/lofar/username/...</tt>: <tt>work</tt> should be a local working directory on the nodes, normally <tt>/local/username</tt>.
<tt>[files]</tt> should give the prefixes (LOFAR IDs) for target and calibrator.
The entries in <tt>[calibration]</tt> are very important. For pre-facet calibration we want <tt>fitcra=True</tt>, <tt>notransfer=True</tt>. If you have a sky model for your calibrator it should be specified in <tt>skymodel</tt> — this is important for calibrators that are resolved on the long baselines like 3C196 or 3C295. Any other pre-calibration activity, such as flagging the international baselines, needs to be specified here. Valid options include:
* <tt>antennafix</tt>: default False, run fixbeaminfo
* <tt>antennafix15</tt>: default False, run 2015 fixbeaminfo
* <tt>flagbadweight</tt>: default False, flag WEIGHT_SPECTRUM>1, needed for some Cycle 2 data
* <tt>rficonsole</tt>: default True, run rficonsole
* <tt>skipexisting</tt>: default False, do not run if output data already exist
<tt>[preflag]</tt> should initially be left empty.
<tt>[control]</tt> is used for general control options -- setting <tt>dryrun=True</tt> will mean that commands to be executed by the scripts are not run but only printed (useful for debugging purposes).
== Step 2: ==
The next step is to calibrate the calibrator. Test the process as follows:
<pre>
(LOFAR setup)
/home/mjh/lofar/surveys-pipeline/calib.py config.cfg subband-number
</pre>
LOFAR setup is as described on the [[LOFAR]] page; config.cfg should be the full path to your config file; sub-band number should be a reliable sub-band, say 200. If all is well, this will take a few minutes and will create a copy of the calibrator and target data in your <tt>processed</tt> path. Feel free to inspect the <tt>CORRECTED_DATA</tt> for the calibrator and the amp/phase solutions in the instrument table.
If this single step works, you can proceed to calibrate all the data. Exit the interactive session and do
<pre>
qsub -t 0-365 -v CONFIG=/full/path/to/config.cfg /home/mjh/lofar/surveys-pipeline/run-calib.qsub
</pre>
This runs as many jobs as possible in parallel, so initially you will submit 366 separate jobs to the queue. Use <tt>qstat</tt> to check the progress of the jobs as they pass from queued (Q) to running (R) to completed (C). Each individual job takes only a few minutes. Jobs that complete immediately are a sign of problems. Check the output from these jobs, which will accumulate in your home directory. When everything is completed, check that all data have been written to the processed directory as expected.
=== Step 3: ===
The next steps set things up for Clock-TEC separation. Make sure the LOFAR scripts are on your path as usual, then
<pre>
setenv PYTHONPATH /home/mjh/git:/home/mjh/reinout-scripts_v3:$PYTHONPATH
/home/mjh/lofar/surveys-pipeline/clocktec-prep.py file.cfg
/home/mjh/lofar/surveys-pipeline/amplitudes_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/find_bad_subband.py cal.h5
</pre>
The <tt>amplitudes_losoto</tt> script will generate some matrices of amplitude solutions vs time and baseline in the working directory. <tt>find_bad_subband</tt> searches these for outliers. You should verify the sub-bands you want to exclude by looking at the matrices, then add a <tt>badsblist</tt> line to the calibration section of your config file, e.g.
<pre>
badsblist=[267, 302, 304, 305, 306, 307, 308, 309, 310, 311]
</pre>
If you find bad *antennas* at this point — or antennas that are bad on many baselines — it is best to put them in <tt>preflag</tt> in the config file and redo the calibration from the end of step 2 (i.e. delete everything in your processed directory and redo the 366-band qsub).
Then fit for Clock-TEC:
<pre>
/home/mjh/lofar/surveys-pipeline/amplitudes_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/fit_clocktec_initialguess_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/examine_npys.py file.cfg
</pre>
Again, look at the plots generated by these scripts. If all looks sensible (core stations have low clock offsets and clock is largely constant per antenna) then continue:
<pre>
/home/mjh/lofar/surveys-pipeline/find_cal_global_phaseoffset.py file.cfg
/home/mjh/lofar/surveys-pipeline/make_template_parmdb.py file.cfg
</pre>
You may now leave the interactive session and apply the solutions to the target:
<pre>
qsub -t 0-366 -q main -W group_list=lofar /home/mjh/lofar/surveys-pipeline/apply-clocktec.qsub -v CONFIG=file.cfg
</pre>
== Step 4: ==
You now need to combine these individual datasets and prepare for facet calibration. TBD...
1e9037dbd6bcfebce6ebd97ce861081887b31028
417
416
2016-01-21T14:43:21Z
Mjh
2
wikitext
text/x-wiki
The Herts HBA pipeline is a set of 'pre-facet' processing steps for HBA data that are adapted to running on the Herts cluster using the [[jobs|job control system]].
All the steps in the pipeline process are controlled by a single parameter set in a simple hierarchical format. An example parameter set covering the initial steps is shown below.
<pre>
[paths]
unpack=/car-data/mjh/lofar/new
processed=/smp3/mjh/lofar/nw-facet
work=/local/mjh
[files]
calibrator=L221264
target=L221266
[calibration]
flagintbaselines=True
skymodel=/home/mjh/lofar/surveys-pipeline/3C196PANDEYJUL14.skymodel
fitcra=True
notransfer=True
[preflag]
sbrange=0,121
antenna=CS103HBA0
[control]
dryrun=False
</pre>
All the scripts are located in <tt>/home/mjh/lofar/surveys-pipeline</tt>. This description assumes that you are logged in to the LOFAR-UK head node.
== Step 1: ==
You should begin by creating your own version of the config file shown here. In what follows <tt>username</tt> represents your login ID on the system. In <tt>[paths]</tt>, <tt>unpack</tt> is the directory containing the tar files for the unprocessed target and calibrator fields: <tt>processed</tt> is the directory where processed data will be stored, normally <tt>/data/lofar/username/...</tt>: <tt>work</tt> should be a local working directory on the nodes, normally <tt>/local/username</tt>.
<tt>[files]</tt> should give the prefixes (LOFAR IDs) for target and calibrator.
The entries in <tt>[calibration]</tt> are very important. For pre-facet calibration we want <tt>fitcra=True</tt>, <tt>notransfer=True</tt>. If you have a sky model for your calibrator it should be specified in <tt>skymodel</tt> — this is important for calibrators that are resolved on the long baselines like 3C196 or 3C295. Any other pre-calibration activity, such as flagging the international baselines, needs to be specified here. Valid options include:
* <tt>antennafix</tt>: default False, run fixbeaminfo
* <tt>antennafix15</tt>: default False, run 2015 fixbeaminfo
* <tt>flagbadweight</tt>: default False, flag WEIGHT_SPECTRUM>1, needed for some Cycle 2 data
* <tt>rficonsole</tt>: default True, run rficonsole
* <tt>skipexisting</tt>: default False, do not run if output data already exist
<tt>[preflag]</tt> should initially be left empty.
<tt>[control]</tt> is used for general control options -- setting <tt>dryrun=True</tt> will mean that commands to be executed by the scripts are not run but only printed (useful for debugging purposes).
== Step 2: ==
The next step is to calibrate the calibrator. Test the process as follows:
<pre>
(LOFAR setup)
/home/mjh/lofar/surveys-pipeline/calib.py config.cfg subband-number
</pre>
LOFAR setup is as described on the [[LOFAR]] page; config.cfg should be the full path to your config file; sub-band number should be a reliable sub-band, say 200. If all is well, this will take a few minutes and will create a copy of the calibrator and target data in your <tt>processed</tt> path. Feel free to inspect the <tt>CORRECTED_DATA</tt> for the calibrator and the amp/phase solutions in the instrument table.
If this single step works, you can proceed to calibrate all the data. Exit the interactive session and do
<pre>
qsub -t 0-365 -v CONFIG=/full/path/to/config.cfg /home/mjh/lofar/surveys-pipeline/run-calib.qsub
</pre>
This runs as many jobs as possible in parallel, so initially you will submit 366 separate jobs to the queue. Use <tt>qstat</tt> to check the progress of the jobs as they pass from queued (Q) to running (R) to completed (C). Each individual job takes only a few minutes. Jobs that complete immediately are a sign of problems. Check the output from these jobs, which will accumulate in your home directory. When everything is completed, check that all data have been written to the processed directory as expected.
== Step 3: ==
The next steps set things up for Clock-TEC separation. Make sure the LOFAR scripts are on your path as usual, then
<pre>
setenv PYTHONPATH /home/mjh/git:/home/mjh/reinout-scripts_v3:$PYTHONPATH
/home/mjh/lofar/surveys-pipeline/clocktec-prep.py file.cfg
/home/mjh/lofar/surveys-pipeline/amplitudes_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/find_bad_subband.py cal.h5
</pre>
The <tt>amplitudes_losoto</tt> script will generate some matrices of amplitude solutions vs time and baseline in the working directory. <tt>find_bad_subband</tt> searches these for outliers. You should verify the sub-bands you want to exclude by looking at the matrices, then add a <tt>badsblist</tt> line to the calibration section of your config file, e.g.
<pre>
badsblist=[267, 302, 304, 305, 306, 307, 308, 309, 310, 311]
</pre>
If you find bad *antennas* at this point — or antennas that are bad on many baselines — it is best to put them in <tt>preflag</tt> in the config file and redo the calibration from the end of step 2 (i.e. delete everything in your processed directory and redo the 366-band qsub).
Then fit for Clock-TEC:
<pre>
/home/mjh/lofar/surveys-pipeline/amplitudes_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/fit_clocktec_initialguess_losoto.py file.cfg
/home/mjh/lofar/surveys-pipeline/examine_npys.py file.cfg
</pre>
Again, look at the plots generated by these scripts. If all looks sensible (core stations have low clock offsets and clock is largely constant per antenna) then continue:
<pre>
/home/mjh/lofar/surveys-pipeline/find_cal_global_phaseoffset.py file.cfg
/home/mjh/lofar/surveys-pipeline/make_template_parmdb.py file.cfg
</pre>
You may now leave the interactive session and apply the solutions to the target:
<pre>
qsub -t 0-366 -q main -W group_list=lofar /home/mjh/lofar/surveys-pipeline/apply-clocktec.qsub -v CONFIG=file.cfg
</pre>
== Step 4: ==
You now need to combine these individual datasets and prepare for facet calibration. TBD...
841d64ad2a5bbf6f703f32be2e73fa59fbbfa1e9
User:Wwilliams
2
62
418
2016-02-05T12:36:01Z
Mjh
2
Creating user page for new user.
wikitext
text/x-wiki
Dr Wendy L. Williams
Postdoctoral Research Assistant
Centre for Astrophysics Research
School of Physics, Astronomy and Mathematics
University of Hertfordshire
PhD in Astronomy
Leiden Observatory
Leiden University
Research interests:
- Multi-wavelength studies of galaxy formation and evolution over cosmic time
- Evolution of active galactic nuclei
- low-frequency radio calibration and imaging (especially LOFAR)
- radio surveys
b0afe0930354bb1ad84e786949e055cb49691f62
User talk:Wwilliams
3
63
419
2016-02-05T12:36:02Z
Mjh
2
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents help pages].
Again, welcome and have fun! [[User:Mjh|Mjh]] ([[User talk:Mjh|talk]]) 12:36, 5 February 2016 (UTC)
d06df0830827e935817c13275e68c8c4ec08d78d
Generic pipeline
0
64
421
2016-02-05T17:24:56Z
Wwilliams
13
created
wikitext
text/x-wiki
== The generic pipeline ==
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build, where a few patches have been applied (<tt>$LOFARROOT=/soft/lofar-070915/</tt>).
* A patch has been made to allow it to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
* A patch has been made to fix a bug with output locations (in particular for calibrator diagnostic plots and npy files). This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/recipes/nodes/python_plugin.py</tt>, but should be fixed in the latest LOFAR builds (2.15+).
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-070915/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-070915/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-070915
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-070915/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note the the <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* The <tt>[remote]</tt> section with <tt>method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
== Pre-Facet calibration ==
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
=== Some known problems ===
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
90b16b4680900b71b5d0e368cc4afbf91254e954
423
421
2016-02-05T17:26:03Z
Wwilliams
13
Wwilliams moved page [[Running the generic pipeline]] to [[Generic pipeline]]
wikitext
text/x-wiki
== The generic pipeline ==
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build, where a few patches have been applied (<tt>$LOFARROOT=/soft/lofar-070915/</tt>).
* A patch has been made to allow it to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
* A patch has been made to fix a bug with output locations (in particular for calibrator diagnostic plots and npy files). This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/recipes/nodes/python_plugin.py</tt>, but should be fixed in the latest LOFAR builds (2.15+).
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-070915/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-070915/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-070915
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-070915/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note the the <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* The <tt>[remote]</tt> section with <tt>method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
== Pre-Facet calibration ==
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
=== Some known problems ===
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
90b16b4680900b71b5d0e368cc4afbf91254e954
Running the generic pipeline
0
65
424
2016-02-05T17:26:03Z
Wwilliams
13
Wwilliams moved page [[Running the generic pipeline]] to [[Generic pipeline]]
wikitext
text/x-wiki
#REDIRECT [[Generic pipeline]]
6054ba2fedbdb74c6f7c3b24e2fe395f928ce437
Generic pipeline
0
64
425
423
2016-02-05T17:27:24Z
Wwilliams
13
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build, where a few patches have been applied (<tt>$LOFARROOT=/soft/lofar-070915/</tt>).
* A patch has been made to allow it to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
* A patch has been made to fix a bug with output locations (in particular for calibrator diagnostic plots and npy files). This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/recipes/nodes/python_plugin.py</tt>, but should be fixed in the latest LOFAR builds (2.15+).
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-070915/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-070915/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-070915
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-070915/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note the the <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* The <tt>[remote]</tt> section with <tt>method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
f458766d07ac14a37530bf925b0be61fd79858e8
426
425
2016-02-09T11:48:09Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build, where a few patches have been applied (<tt>$LOFARROOT=/soft/lofar-070915/</tt>).
* A patch has been made to allow it to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
* A patch has been made to fix a bug with output locations (in particular for calibrator diagnostic plots and npy files). This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/recipes/nodes/python_plugin.py</tt>, but should be fixed in the latest LOFAR builds (2.15+).
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-070915/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-070915/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-070915
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-070915/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
dd58e1f25d1cb2753f03a045b1df59d6575a3e13
427
426
2016-02-16T15:40:45Z
Wwilliams
13
/* The generic pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-050216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-070915/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-070915/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-070915
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-070915/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
5182c135deccf071286d65cc8234a98d0505c7f3
428
427
2016-02-16T15:41:29Z
Wwilliams
13
/* The generic pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-050216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-050216/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-070915/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-070915
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-070915/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
f6ad7bcd4bd662868b96cab48bbdcc34f2532a8c
429
428
2016-02-16T15:42:11Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-050216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-050216/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-070915/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-050216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-050216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
332d43031ab57aa4b0ebc0bd40262705fc893bfc
430
429
2016-02-16T15:42:39Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-050216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-050216/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-050216/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-050216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-050216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
7d9eaccf713ad682858f1b0f8289f47a2f7a59d0
431
430
2016-02-16T15:45:35Z
Wwilliams
13
add ssh fix
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-050216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-050216/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-050216/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-050216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-050216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
2aa28dbf5f5494c8faa063da08c782b6c332fc20
432
431
2016-02-16T15:47:26Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-050216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-050216/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l pmem=2gb
#PBS -l walltime=24:00:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-050216/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-050216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-050216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
be10d639abc59407c674454c1bc04798b37593d4
433
432
2016-02-17T10:50:34Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-050216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-050216/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-050216/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-050216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-050216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
7e0eb197849499ec0399f7a4c708eec5a58e7c75
434
433
2016-02-19T15:44:27Z
Wwilliams
13
/* The generic pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-050216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; module load lofar; source /soft/lofar-050216/lofarinit.csh; setenv PATH /home/mjh/lofar/bin:$PATH; setenv PYTHONPATH /soft/pyrap-new:$PYTHONPATH; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
Passwordless SSH access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-050216/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-050216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-050216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
f2eefcef40cde6c6721d1e069b4ae003c2727365
437
434
2016-03-03T14:27:41Z
Wwilliams
13
/* The generic pipeline */ update paths to latest lofar
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-220216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
Passwordless SSH access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar ; PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:${PATH} ; . /soft/lofar-050216/lofarinit.sh; PATH=/home/mjh/lofar/bin:$PATH; export PATH; PYTHONPATH=/soft/pyrap-new:$PYTHONPATH; export PYTHONPATH ; LD_LIBRARY_PATH=/soft/pyrap-new/lib64:/soft/boost/lib:/soft/casacore-1.7.0/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-050216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-050216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
409a9b05c17026eb6ec0b3d173fe4e772da24582
438
437
2016-03-03T14:29:42Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-220216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
Passwordless SSH access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
c3efd3b5f0d494e1238bac03947916cd07be175a
439
438
2016-03-03T14:31:52Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-220216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
Passwordless SSH access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. You can make a more general submit script by passing the parset as an argument and using the <pre>-v</pre> flag on your qsub command.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
5778aff6b610b98037cf0737e5ddfb7901bedc7c
440
439
2016-03-03T14:34:37Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-220216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
Passwordless SSH access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. The required arguments are a parset and configuration file. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
f09ab79c13c2c897d7ac2ecfdfc7076771935d4e
441
440
2016-03-03T14:37:07Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-220216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
Passwordless SSH access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
<pre>
#!/bin/bash
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
echo
echo running generic pipeline with parset $PARSET
echo
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py $PARSET -v -d -c /home/wwilliams/survey_pipeline/lpipeline-new.cfg
</pre>
and e.g.
<pre>
qsub -N L343220-pfc -l walltime=96:00:00 -l nodes=1:ppn=16 -W group_list=lofar -v PARSET=~/survey_pipeline/parset/survey/pfc_L343220.parset ~/survey_pipeline/runfiles/qsub/run_pipe.qsub
</pre>
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
bd2e207b25de99c6776062fc1e0ccfff8a101fe5
442
441
2016-03-03T14:38:58Z
Wwilliams
13
/* Some known problems */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-220216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
Passwordless SSH access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
<pre>
#!/bin/bash
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
echo
echo running generic pipeline with parset $PARSET
echo
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py $PARSET -v -d -c /home/wwilliams/survey_pipeline/lpipeline-new.cfg
</pre>
and e.g.
<pre>
qsub -N L343220-pfc -l walltime=96:00:00 -l nodes=1:ppn=16 -W group_list=lofar -v PARSET=~/survey_pipeline/parset/survey/pfc_L343220.parset ~/survey_pipeline/runfiles/qsub/run_pipe.qsub
</pre>
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
* check you're sourcing the right lofar software version
c0ab6506403c9fff0bdb3c24d6af5ed006e71485
443
442
2016-03-03T14:41:28Z
Wwilliams
13
/* The generic pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-220216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
[[Passwordless_ssh]] access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
<pre>
#!/bin/bash
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
echo
echo running generic pipeline with parset $PARSET
echo
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py $PARSET -v -d -c /home/wwilliams/survey_pipeline/lpipeline-new.cfg
</pre>
and e.g.
<pre>
qsub -N L343220-pfc -l walltime=96:00:00 -l nodes=1:ppn=16 -W group_list=lofar -v PARSET=~/survey_pipeline/parset/survey/pfc_L343220.parset ~/survey_pipeline/runfiles/qsub/run_pipe.qsub
</pre>
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
* check you're sourcing the right lofar software version
1ca99409ec408badf1d692261fed91fd66a3904d
444
443
2016-03-15T11:38:36Z
Wwilliams
13
/* The generic pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-220216/</tt>).
* A patch has been made to allow the pipeline to run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. This patch is in <tt>$LOFARROOT/lib64/python2.7/site-packages/lofarpipe/support/remotecommand.py</tt>
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
[[Passwordless_ssh]] access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
<pre>
#!/bin/bash
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
echo
echo running generic pipeline with parset $PARSET
echo
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py $PARSET -v -d -c /home/wwilliams/survey_pipeline/lpipeline-new.cfg
</pre>
and e.g.
<pre>
qsub -N L343220-pfc -l walltime=96:00:00 -l nodes=1:ppn=16 -W group_list=lofar -v PARSET=~/survey_pipeline/parset/survey/pfc_L343220.parset ~/survey_pipeline/runfiles/qsub/run_pipe.qsub
</pre>
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
* check you're sourcing the right lofar software version
39ad8a92eec8eaf58d1be5584a5529d1153efc8e
455
444
2016-11-15T15:08:27Z
Wwilliams
13
/* The generic pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-051116/</tt>).
The pipeline can run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes.
The pipeline runs sends processes to other available nodes using <tt>mpiexec</tt>, but requires the <tt>openmpi</tt> version instead of what is loaded as standard on the cluster (see also [[MPI|MPI]]). For all the nodes to understand the <tt>mpiexec</tt> command that the pipeline sends via <tt>ssh</tt>, this should be set in your <tt>.cshrc</tt> file:
<pre>
module unload mpi/mpich-x86_64
module load mpi/openmpi-x86_64
</pre>
The LOFAR software should also be sourced here, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'module load lofar ; setenv PATH /soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-220216/lofarinit.csh ; setenv PATH /home/mjh/lofar/bin:$PATH ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:/home/wwilliams/python/face-cal-scripts/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
[[Passwordless_ssh]] access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
<pre>
#!/bin/bash
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
echo
echo running generic pipeline with parset $PARSET
echo
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py $PARSET -v -d -c /home/wwilliams/survey_pipeline/lpipeline-new.cfg
</pre>
and e.g.
<pre>
qsub -N L343220-pfc -l walltime=96:00:00 -l nodes=1:ppn=16 -W group_list=lofar -v PARSET=~/survey_pipeline/parset/survey/pfc_L343220.parset ~/survey_pipeline/runfiles/qsub/run_pipe.qsub
</pre>
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
* check you're sourcing the right lofar software version
c8f846d41a9789b58fc9755085182583c9bb8686
456
455
2016-11-15T16:24:19Z
Wwilliams
13
/* The generic pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-051116/</tt>).
The pipeline can run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. The pipeline sends processes to other available nodes using <tt>ssh</tt>.
The LOFAR software should also be sourced in your bash or csh rc file, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casa-release-4.5.0-el6:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-051116/lofarinit.csh ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
[[Passwordless_ssh]] access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
<pre>
#!/bin/bash
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
echo
echo running generic pipeline with parset $PARSET
echo
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py $PARSET -v -d -c /home/wwilliams/survey_pipeline/lpipeline-new.cfg
</pre>
and e.g.
<pre>
qsub -N L343220-pfc -l walltime=96:00:00 -l nodes=1:ppn=16 -W group_list=lofar -v PARSET=~/survey_pipeline/parset/survey/pfc_L343220.parset ~/survey_pipeline/runfiles/qsub/run_pipe.qsub
</pre>
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = mpiexec
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = mpiexec</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
* check you're sourcing the right lofar software version
b58b9bb3404b2de7b3c0d159c9147d6e6e1fc8a1
457
456
2016-11-15T16:25:20Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-051116/</tt>).
The pipeline can run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. The pipeline sends processes to other available nodes using <tt>ssh</tt>.
The LOFAR software should also be sourced in your bash or csh rc file, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casa-release-4.5.0-el6:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-051116/lofarinit.csh ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
[[Passwordless_ssh]] access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
<pre>
#!/bin/bash
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
echo
echo running generic pipeline with parset $PARSET
echo
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-220216/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py $PARSET -v -d -c /home/wwilliams/survey_pipeline/lpipeline-new.cfg
</pre>
and e.g.
<pre>
qsub -N L343220-pfc -l walltime=96:00:00 -l nodes=1:ppn=16 -W group_list=lofar -v PARSET=~/survey_pipeline/parset/survey/pfc_L343220.parset ~/survey_pipeline/runfiles/qsub/run_pipe.qsub
</pre>
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-220216
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-220216/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = pbs_ssh
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = pbs_ssh</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
* check you're sourcing the right lofar software version
eb2ad0ed78ec6f801c30f45371ac5a16dea04a5b
458
457
2016-11-15T16:26:22Z
Wwilliams
13
/* running the pipeline */
wikitext
text/x-wiki
= The generic pipeline =
The generic pipeline comes with LOFAR builds 2.13+. At http://www.astron.nl/citt/genericpipeline/#quick-start
there is a description of how to setup and run the generic pipeline. At Herts, the pipeline is included in the most recent lofar build (2.15 : <tt>$LOFARROOT=/soft/lofar-051116/</tt>).
The pipeline can run across multiple whole nodes, in that you can submit a job which will run on N nodes. The [[jobs|job control system]] will allocate the nodes to you and the generic pipeline will deal with distributing the processes to the other nodes. The pipeline sends processes to other available nodes using <tt>ssh</tt>.
The LOFAR software should also be sourced in your bash or csh rc file, so that all the nodes can run the lofar commands, e.g.:
<pre>
alias lofar-newest 'setenv PATH /soft/casa-release-4.5.0-el6:/soft/casacore-220216/bin:${PATH} ; source /soft/lofar-051116/lofarinit.csh ; setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-220216/lib64/python2.7/site-packages:$PYTHONPATH ; setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
alias lofar-tools 'setenv PYTHONPATH /home/wwilliams/software/lib/python2.7/site-packages/:/home/wwilliams/software/anaconda2/lib/python2.7/site-packages/:$PYTHONPATH'
lofar-newest
lofar-tools
</pre>
To allow multiple versions of the pipeline on different nodes, the ssh host checking needs to be bypassed for localhost. This can be done in your <tt>.ssh/config</tt> file:
<pre>
Host localhost
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
</pre>
[[Passwordless_ssh]] access between the nodes needs to be set up.
= running the pipeline =
The pipeline can be launched by submitting a job to the job control system, e.g.:
<pre>
qsub -l nodes=4:ppn=16 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
where <tt>nodes=4:ppn=16</tt> will request for the job to run on 4 entire (16 cpu) nodes.
Alternatively it can be run on a part of a node (being careful to ensure that your parset matches the resources you are requesting):
<pre>
qsub -l nodes=1:ppn=8 -W group_list=lofar ~/survey_pipeline/runfiles/qsub/run_pipe_L400115.qsub
</pre>
An example <tt>qsub</tt> script is:
<pre>
#!/bin/bash
#PBS -N pipeline-L424617
#PBS -l walltime=168:00:00
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-051116/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:/soft/lofar-051116/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
where
<pre>
genericpipeline.py /home/wwilliams/survey_pipeline/parset/PreFacetCal_L424617.parset -v -d -c /home/wwilliams/survey_pipeline/lpipeline.cfg
</pre>
is the actual call to run the generic pipeline. The required arguments are a parset and configuration file. The calls to source the LOFAR software are required if this is not set in your <tt>.bashrc</tt>. You can make a more general submit script by passing the parset as an argument and using the <tt>-v</tt> flag on your qsub command.
<pre>
#!/bin/bash
#PBS -k o
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: ARRAYID = $PBS_ARRAYID
echo ------------------------------------------------------
echo
echo running generic pipeline with parset $PARSET
echo
echo ------------------------------------------------------
modulecmd bash load lofar
PATH=/soft/casapy-42.2.30986-1-64b:/home/mjh/bin/postgresql/bin:/soft/casacore-220216/bin:${PATH}
. /soft/lofar-051116/lofarinit.sh
PATH=/home/mjh/lofar/bin:$PATH
PYTHONPATH=/soft/pyrap-220216/usr/lib64/python2.7/site-packages:$PYTHONPATH
LD_LIBRARY_PATH=/soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH
genericpipeline.py $PARSET -v -d -c /home/wwilliams/survey_pipeline/lpipeline-new.cfg
</pre>
and e.g.
<pre>
qsub -N L343220-pfc -l walltime=96:00:00 -l nodes=1:ppn=16 -W group_list=lofar -v PARSET=~/survey_pipeline/parset/survey/pfc_L343220.parset ~/survey_pipeline/runfiles/qsub/run_pipe.qsub
</pre>
The parset is the important part which defines all the steps to be take.
The configuration file should be something like:
<pre>
[DEFAULT]
lofarroot = /soft/lofar-051116
casaroot = /soft/casacore-1.7.0
pyraproot = /soft
hdf5root =
wcsroot = /opt/cep/wcslib
pythonpath = /soft/lofar-051116/lib64/python2.7/site-packages
runtime_directory = /car-data/wwilliams/pipeline-output/
recipe_directories = [%(pythonpath)s/lofarpipe/recipes,/home/wwilliams/scripts/git/prefactor]
working_directory = /car-data/wwilliams/pipeline-products/
task_files = [%(lofarroot)s/share/pipeline/tasks.cfg]
[layout]
job_directory = %(runtime_directory)s/%(job_name)s
[cluster]
clusterdesc = %(lofarroot)s/share/cep2.clusterdesc
[deploy]
engine_ppath = %(pythonpath)s:%(pyraproot)s/lib:/opt/cep/pythonlibs/lib/python/site-packages
engine_lpath = %(lofarroot)s/lib:%(casaroot)s/lib:%(pyraproot)s/lib:%(hdf5root)s/lib:%(wcsroot)s/lib
[logging]
log_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/pipeline.log
xml_stat_file = %(runtime_directory)s/%(job_name)s/logs/%(start_time)s/statistics.xml
[feedback]
# Method of providing feedback to LOFAR.
# Valid options:
# messagebus Send feedback and status using LCS/MessageBus
# none Do NOT send feedback and status
method = none
[remote]
method = pbs_ssh
max_per_node = 16
</pre>
* Note <tt>[feedback] method = none</tt> is set because the messagebus does not currently work on the Herts cluster.
* <tt>[remote] method = pbs_ssh</tt> allows for the use of multiple nodes.
* The <tt>recipe_directories</tt> should be modified to include the Pre-Facet-Cal recipes (<tt>/home/wwilliams/scripts/git/prefactor</tt>).
= Pre-Facet calibration =
Set up the pre-facet calibration pipeline
* download scripts from https://github.com/lofar-astron/prefactor. In addition to the changes mentioned in the genericpipeline -> quick-start page, you need to add the Pre-Facet-Cal directory to the "recipe_directories" in the pipeline.cfg (so that the plugins that come with the Pre-Facet-Cal are found). The current cookbook-section on the pre-facet pipeline can be found at https://github.com/lofar-astron/prefactor/blob/AH-development/docs/cookbook_prefacet.pdf
* Edit the Pre-Facet-Cal.parset appropriately (see notes in the parset) and run the generic pipeline with this pre-facet-calibration parset.
== Some known problems ==
* in argument.flags there should be no spaces! e.g.
<pre>h5imp_cal.argument.flags = [h5_imp_map.output.mapfile, h5imp_cal_losoto.h5]</pre>
* check you're sourcing the right lofar software version
7265cc9e30d0ce6072ba0fc0251b79ef21379b48
Software
0
17
435
405
2016-02-23T13:27:09Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 4.5.5 installed in <tt>/soft/gromacs-new</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/MATLAB/R2013b/bin/Matlab</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* <u>[[Python packages]]</u>: set your PYTHONPATH to include <tt>/soft/python/lib64/python2.7/site-packages</tt>
* <u>[[neuron]]</u>: in <tt> /soft/nrn</tt>
* <u>[[Miriad]]</u>: in <tt> /soft/miriad</tt>
* <u>[[ciao]]</u>: in <tt>/soft/ciao-x.x</tt>
e2a56110c439ace3b692c9e1a44c3c67a61da509
468
435
2017-03-23T10:03:32Z
H.patel
14
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 2016 - GPU acceleration installed in <tt>/soft/gromacs-2016-gpu</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/MATLAB/R2013b/bin/Matlab</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* <u>[[Python packages]]</u>: set your PYTHONPATH to include <tt>/soft/python/lib64/python2.7/site-packages</tt>
* <u>[[neuron]]</u>: in <tt> /soft/nrn</tt>
* <u>[[Miriad]]</u>: in <tt> /soft/miriad</tt>
* <u>[[ciao]]</u>: in <tt>/soft/ciao-x.x</tt>
a37c0a7568d4b27a3a4db025db079ccdbdbdf74d
471
468
2017-03-23T10:21:51Z
H.patel
14
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 2016 (with GPU acceleration) installed in <tt>/soft/gromacs-2016-gpu</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/MATLAB/R2013b/bin/Matlab</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* <u>[[Python packages]]</u>: set your PYTHONPATH to include <tt>/soft/python/lib64/python2.7/site-packages</tt>
* <u>[[neuron]]</u>: in <tt> /soft/nrn</tt>
* <u>[[Miriad]]</u>: in <tt> /soft/miriad</tt>
* <u>[[ciao]]</u>: in <tt>/soft/ciao-x.x</tt>
4c3a37b091c81a0551770d26649a036e3b09cd13
Ciao
0
66
436
2016-02-23T13:27:41Z
Mjh
2
Created page with "CIAO is the Chandra data reduction software. Access it by doing <tt>source /soft/ciao-4.8/ciao-4.8/bin/ciao.csh</tt>"
wikitext
text/x-wiki
CIAO is the Chandra data reduction software.
Access it by doing <tt>source /soft/ciao-4.8/ciao-4.8/bin/ciao.csh</tt>
59d0f4bc229cb31f52e893e459a88a4b4aef155d
Main Page
0
1
445
409
2016-04-01T20:15:18Z
Mjh
2
/* Using the cluster */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the STRI cluster.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to do a particular thing on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[Monitoring]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
51677982500d58a12c78358571b5d9b93dc4c73e
Monitoring
0
67
446
2016-04-01T20:17:27Z
Mjh
2
Created page with "Some monitoring services run on the head node. You can monitor the state of the main servers with Munin[http://stri-cluster.herts.ac.uk/munin/] and the state of the nodes and..."
wikitext
text/x-wiki
Some monitoring services run on the head node. You can monitor the state of the main servers with Munin[http://stri-cluster.herts.ac.uk/munin/] and the state of the nodes and network with Ganglia[http://stri-cluster.herts.ac.uk/ganglia].
b97abe1eb332e3b2742c1870b48f483fb8070692
Jobs
0
9
447
383
2016-04-27T19:59:54Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://docs.adaptivecomputing.com/torque/help.htm Torque] (formerly known as PBS) and [http://docs.adaptivecomputing.com/maui/index.php Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all 48 nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/qsub.htm online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
* <tt>file</tt>: the [[local disk space]] to be used by your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than the 8 that are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, would actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use 8 processes per node for MPI or multi-threaded code. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qalter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/2-jobs/exportedBatchEnvVar.htm here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/pbsdsh.htm pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
af1d5e98732a89478b4b925b7e6269d6ae23c771
Neuron
0
53
448
369
2016-07-29T15:16:03Z
Dbab
11
wikitext
text/x-wiki
neuron is installed in /soft/nrn
to run neuron you should have the library path on path, as well as the mpi. Set it by using
<pre>
setenv LD_LIBRARY_PATH /soft/lib
setenv PATH ${PATH}:/soft/mpi/openmpi-1.4.3/bin/
</pre>
Notice that there are many mpi installed, so you might want to pick another one
To make the changes permanent, copy paste the following either add them to your .tcshrc, or copy paste the following in your terminal
<pre>
echo 'setenv LD_LIBRARY_PATH:/soft/lib'>>~/.tcshrc
echo 'setenv PATH ${PATH}:/soft/mpi/openmpi-1.4.3/bin/'>>~/.tcshrc
</pre>
Now you can run neuron using
<pre>
/soft/nrn/x86_64/bin/nrniv
/soft/nrn/x86_64/bin/nrngui
etc.
</pre>
But don't run experiments directly. To do so you need to use [[Jobs]].
a41fe23c4a172da45c8a474734df977eaee3f692
LOFAR-UK Compute Facility
0
57
449
422
2016-08-04T16:06:53Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes mostly with 64 GB RAM and 16 cores. (Some machines have 192 or 256 GB RAM.) Reservations on the more powerful [[SMP machines]] can be made if necessary.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive jobs, as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[Herts LOFAR HBA pipeline]] is available.
A description of the [[generic pipeline]] is available.
a46d6a96c2282f3d09803d838e6759dbdf2fb444
452
449
2016-08-04T16:14:52Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 128 GB RAM and 16 cores, plus some number (up to 12) of compute nodes mostly with 64 GB RAM and 16 cores. (Some machines have 192 or 256 GB RAM.) Reservations on the more powerful [[SMP machines]] can be made if necessary.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive [[jobs]], as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[Herts LOFAR HBA pipeline]] is available.
A description of the [[generic pipeline]] is available.
eccb4b4909d736b86e3e86dc941a5ec871e5663b
460
452
2016-12-06T17:12:58Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the STRI cluster reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 256 GB RAM and 16 cores, plus some number (up to 12) of compute nodes mostly with 64 GB RAM and 16 cores. (Some machines have 192 or 256 GB RAM.) Reservations on the more powerful [[SMP machines]] can be made if necessary.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive [[jobs]], as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[Herts LOFAR HBA pipeline]] is available.
A description of the [[generic pipeline]] is available.
3af075a16d54f16da479f42896c509ada2649fec
LOFAR
0
47
450
385
2016-08-04T16:12:05Z
Mjh
2
wikitext
text/x-wiki
You will need a <tt>.casarc</tt> file, something like this should do:
<pre>
measures.DE200.directory: /soft/casapy/data/ephemerides
measures.DE405.directory: /soft/casapy/data/ephemerides
measures.line.directory: /soft/casapy/data/ephemerides
measures.sources.directory: /soft/casapy/data/ephemerides
measures.comet.directory: /soft/casapy/data/ephemerides
measures.ierseop97.directory: /soft/casapy/data/geodetic
measures.ierspredict.directory: /soft/casapy/data/geodetic
measures.tai_utc.directory: /soft/casapy/data/geodetic
measures.igrf.directory: /soft/casapy/data/geodetic
measures.observatory.directory: /soft/casapy/data/geodetic
</pre>
Then a full setup for up-to-date versions of the LOFAR software looks something like this:
<pre>
setenv PATH /soft/casacore-290316/bin:/soft/casa-release-4.5.0-el6:${PATH}
source /soft/lofar-090616/lofarinit.csh
setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages/:$PYTHONPATH
setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
</pre>
The LOFAR software is frequently updated. For the most up-to-date versions look for /soft/lofar-date where date is a numeric build date. Then
source /soft/lofar-date/lofarinit.csh instead.
423f17b751873d26b2d10e4d2b554a9b7dfbc48d
451
450
2016-08-04T16:13:53Z
Mjh
2
wikitext
text/x-wiki
You will need a <tt>.casarc</tt> file, something like this should do:
<pre>
measures.DE200.directory: /soft/casapy/data/ephemerides
measures.DE405.directory: /soft/casapy/data/ephemerides
measures.line.directory: /soft/casapy/data/ephemerides
measures.sources.directory: /soft/casapy/data/ephemerides
measures.comet.directory: /soft/casapy/data/ephemerides
measures.ierseop97.directory: /soft/casapy/data/geodetic
measures.ierspredict.directory: /soft/casapy/data/geodetic
measures.tai_utc.directory: /soft/casapy/data/geodetic
measures.igrf.directory: /soft/casapy/data/geodetic
measures.observatory.directory: /soft/casapy/data/geodetic
</pre>
Then a full setup for up-to-date versions of the LOFAR software looks something like this:
<pre>
setenv PATH /soft/casacore-290316/bin:${PATH}
source /soft/lofar-090616/lofarinit.csh
setenv PYTHONPATH /soft/pyrap-220216/usr/lib64/python2.7/site-packages/:$PYTHONPATH
setenv LD_LIBRARY_PATH /soft/boost/lib:/soft/casacore-220216/lib:$LD_LIBRARY_PATH'
</pre>
The LOFAR software is frequently updated. For the most up-to-date versions look for /soft/lofar-date where date is a numeric build date. Then
source /soft/lofar-date/lofarinit.csh instead.
For running FACTOR you will also want casapy (<tt>/soft/casa-release-4.5.0-el6</tt>) and/or wsclean (<tt>/soft/wsclean/bin</tt>) on your path as well.
fb63014a96dc6e1ec8fde6c450a9cabe09a1dcbe
Accounts
0
3
453
306
2016-09-05T15:45:19Z
Asinha
12
wikitext
text/x-wiki
To get an account, speak to Leigh Smith in E117C.
Accounts are available to the following classes of people:
* Members of the Centre for Astrophysics Research (CAR)
* Members of the Centre for Atmospheric & Instrumentation Research (CAIR)
* Other research-active members of the School of Physics, Astronomy and Mathematics (PAM)
* Members of the School of Computer Science (CS)
* Others, by special arrangement; restricted to those who have made a financial contribution to the cluster.
Access is granted subject to observance of our usage [[policies]]. Accounts may be suspended or cancelled if the cluster is abused: see also our [[Account cancellation policy]].
c09b5c4696218d930c46663ad70ef67a64a34ce4
Acknowledgements
0
29
454
304
2016-09-29T09:46:16Z
Mjh
2
wikitext
text/x-wiki
If possible, please say explicitly that you have used the STRI cluster in any paper you publish that makes use of results obtained using it.
We don't insist on any explicit form of words, but an example might be 'This work has made use of the University of Hertfordshire's high-performance computing facility.'
If you wish you can add a link to <tt>http://stri-cluster.herts.ac.uk/</tt>.
Please also add details of any submitted, accepted or published paper using the cluster to the [[Cluster bibliography]] page.
036e3a59a2a0a019b51851d4c8c04e70ee465178
Architecture
0
7
459
398
2016-12-06T17:12:10Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 GB RAM for user logins and development
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-3).
* Two SMP blades in chassis6 with 32 cores (2 sockets x 16 cores) and 256 GB RAM
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 GB RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing 100 TB of storage for [[LOFAR-UK]] users
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
6709bcad63cd88bdd45d476c574fcf1e789baa0b
462
459
2016-12-06T17:16:57Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 GB RAM for user logins and development
* 140 compute nodes (or just 'nodes'), as follows:
** 48 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the Main cluster (chassis 1, 2 and 3)
** 32 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (chassis 4 and 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-3).
* Two SMP blades in chassis6 with 32 cores (2 sockets x 16 cores) and 256 GB RAM
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 GB RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A GPU machine, gpu1
* A [[Tesla|Tesla K80]] unit attached to gpu1
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
d4b0a9a43819f22ba9582c5eacec4ccc8adff476
SMP machines
0
24
461
343
2016-12-06T17:14:01Z
Mjh
2
wikitext
text/x-wiki
The SMP machines are:
* smp1, smp2: two 4-processor, 48-core systems each with 256 Gb of RAM, available for general use. The individual cores in these machines are slightly slower (~ 20% on CPU-bound floating-point operations) than the Intel CPUs of the main computing nodes. So users are recommended to use the main part of the cluster for straightforward computing requirements that do not need the special features of the SMP machines.
* smp3: one 4-processor, 32-core system with 2.2-GHz E5-4620 Intel CPUs and 256 Gb RAM available to CAR users only.
* node095, node96: two 2-processor, 16-core systems with 256 GB RAM.
The big advantage of the SMP machines is the large amount of physical memory visible to all cores. This allows for multi-threaded, shared-memory applications.
The SMP machines smp1-3 all also each have a large amount of local scratch space (10 Tb for smp1/2, 100 Tb for smp3) which is mounted as /scratch on the SMP machines and visible as /smp1, /smp2 and /smp3 on the head node. smp3 is intended for data reduction for CAR users only. node095 and node096 have no local scratch.
Jobs may be started on the SMP machines using the <tt>smp</tt> [[queues|queue]] in Torque.
f49c996bd16c5c3532de1d52d08031149e21ab1b
Networking
0
10
463
313
2016-12-06T17:17:35Z
Mjh
2
wikitext
text/x-wiki
The nodes are linked by Gigabit ethernet and Infiniband networks.
The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet switch to which all nodes in that chassis are connected. These switches, together with the head node, are connected together via a main ethernet switch. The links connecting the head node and the SMP machines, and the uplinks from the two node internal ethernet stacks, are carried by multiple physical cables via link aggregation. A seperate physical ethernet network carries management traffic.
The infiniband network is slightly more complex. Each chassis has an internal infiniband switch and these are all linked via two main infiniband switches. This arrangement is intended to provide redundancy and higher bandwidth between nodes in different chassis. chassis1-3 use DDR infiniband; all other machines on the network have QDR infiniband cards.
The networking arrangements should be taken into consideration when running jobs where fast communication is important -- latency and data transfer rates are somewhat higher between nodes in the same chassis than between different chassis in the same cluster, and ethernet connections between the two sub-clusters are higher-latency and lower-bandwidth still. Best results will be obtained running jobs within a single chassis.
None of the nodes are on the public Internet: they all have private IP addresses starting 192.168.... These addresses are not routable from anywhere else in the University. They can, however, all see the public Internet via IP masquerading through the head node through the ethernet network. (For this reason, it is not sensible to try to make high-volume accesses to the Internet from the nodes.)
Each node has an IP address corresponding to its ethernet port (192.168.2.xxx where xxx is the node number) and one corresponding to the Infiniband port (192.168.3.xxx). Within the cluster, you may use nodexxx.data (e.g. node001.data) and nodexxx.infi (node001.infi) to refer to these two networks. High-volume traffic should use the Infiniband network.
The SMP machines have addresses smp1.data, smp1.infi etc.
8fbb6266bcdaa3b1fa6a4f1b6bc994b68c26ac4d
Administrators
0
6
464
404
2017-03-03T13:15:53Z
Asinha
12
wikitext
text/x-wiki
== Administrators ==
These are currently:
* Leigh Smith, l.c.smith@herts.ac.uk (x3358, room E117C)
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk (x3409, room 2E71 (Innovation Centre)).
Contact us with queries. Basic support queries (e.g. account requests, difficulty logging on or using software) should be directed to Leigh in the first instance.
8d0fa447d05f0a03683f52d82dd0ce440b3dc93c
User:H.patel
2
68
465
2017-03-22T09:35:47Z
Mjh
2
Creating user page for new user.
wikitext
text/x-wiki
Hershna Patel BSc
Research student (computational biochemistry and bioinformatics)
Department of Biological & Environmental Science
School of Life and Medical Sciences
University of Hertfordshire
Hatfield
AL10 9AB
United Kingdom
Publications
Patel, H., & Kukol, A. (2016). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. Journal of Negative Results in Biomedicine, 15(1), 15.
Patel, H., & Kukol, A. (2016). Recent discoveries of influenza A drug target sites to combat virus replication. Biochemical Society Transactions, 44(3), 932-36.
Kukol, A. & Patel, H. 2014. Influenza A nucleoprotein binding sites for antivirals: current research and future potential. Future Virology, 9(7), 625-27.
a1943d3fca274a6170fd257a983e73cced96216b
User talk:H.patel
3
69
466
2017-03-22T09:35:49Z
Mjh
2
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents help pages].
Again, welcome and have fun! [[User:Mjh|Mjh]] ([[User talk:Mjh|talk]]) 09:35, 22 March 2017 (UTC)
ef38c68553f26ea0050459806776ce07e849dca1
Cluster bibliography
0
30
467
384
2017-03-22T13:52:32Z
H.patel
14
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Patel, H., & Kukol, A. (2016). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. Journal of Negative Results in BioMedicine, 15(15).
* Kukol A, Hughes DJ, Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, '''2014''', ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B, How the amyloid-β peptide and membranes affect each other: An extensive simulation study, '''2013''', ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A, Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', '''2011''', 176(2), 133-142.
* Kukol A, Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', '''2011''', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
1cf18b1e168bc842d08e23320d68400417545e99
Gromacs
0
19
469
375
2017-03-23T10:17:36Z
H.patel
14
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield. The current version is built with GPU support which offers a significant speed up of simulation time.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use Gromacs version 2016.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Look here for [[groperform|optimising performance]].
''Andreas/Hershna''
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q gpu
#PBS -l nodes=1:ppn=16
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -k oe
#PBS -u hpatel
# runs a job with name 'GromacsTest' on the gpu machine on the cluster
# uses 1 GPU
# set a maximum time of forty eight hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'hpatel'
# set required paths:
source /soft/gromacs-2016-gpu/bin/GMXRC
# used previously: export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
export LD_LIBRARY_PATH="soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
# specify working directory:
cd /home/hpatel/gromacsGPU
### This is the command ###
gmx mdrun -s run.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
3820d36bd0638aad68ee592dc32dfc12b8d272b2
470
469
2017-03-23T10:20:21Z
H.patel
14
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use Gromacs version 2016.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Look here for [[groperform|optimising performance]]. The current version is built with GPU support which offers a significant speed up of simulation time.
''Andreas/Hershna''
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q gpu
#PBS -l nodes=1:ppn=16
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -k oe
#PBS -u hpatel
# runs a job with name 'GromacsTest' on the gpu machine on the cluster
# uses 1 GPU
# set a maximum time of forty eight hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'hpatel'
# set required paths:
source /soft/gromacs-2016-gpu/bin/GMXRC
# used previously: export LD_LIBRARY_PATH='/usr/lib64/mpich2/lib:$LD_LIBRARY_PATH'
export LD_LIBRARY_PATH="soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
# specify working directory:
cd /home/hpatel/gromacsGPU
### This is the command ###
gmx mdrun -s run.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
c995ee5ec6bc44157a729a41c4ac8f1d1765baaa
Vina
0
23
472
372
2017-04-13T11:15:26Z
H.patel
14
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file (conf.txt in the example below) into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinaScreen.bash > nohup.out &' (not qsub).
''Andreas''
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N $f -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
'''If screening over 500 molecules, the following python script must be used.'''
It requires a file called ‘filelist’ (a list of all molecules to be docked) in the working directory.
The script needs to be run twice; first to generate the scripts to run the screening, each script will contain 100 dockings to submit as 1 job.
The second time; lines 12 and 13 should be deleted, and line 15 (with the qsub in it) uncommented. Running the script again will submit the jobs.
''Hershna''
import os
import time
from subprocess import Popen,PIPE
step=100
mypath=os.getcwd()
files=open('filelist').read().splitlines()
i=0
while i<len(files):
c=0
# test -- write the job scripts to a file but don't run them
q=Popen('cat > test'+str(i),shell=True,stdin=PIPE).stdin
# actual run
# q=Popen('qsub -N dock-'+str(i)+' -j oe -o /dev/null -l nodes=1:ppn=8 -l walltime=33:00:00',shell=True,stdin=PIPE).stdin
q.write('#!/bin/bash\n')
q.write('cd '+mypath+'\n')
while c<100 and i<len(files):
f=files[i]
b='dock_'+f
q.write('mkdir -p '+b+'\n')
q.write('/soft/autodock_vina_1_1_1_linux_x86/bin/vina --config '+mypath+'/conf.txt --ligand '+mypath+'/'+f+' --out '+mypath+'/'+b+'/out.pdbqt --log '+mypath+'/'+b+'/log.txt\n')
i+=1
c+=1
q.close()
time.sleep(2)
978f6abfe2581390fe2aeb6778cb413eabb9b542
473
472
2017-04-13T11:19:40Z
H.patel
14
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file (conf.txt in the example below) into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinaScreen.bash > nohup.out &' (not qsub).
''Andreas''
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N $f -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
'''If screening over 500 molecules, the following python script must be used.'''
It requires a file called ‘filelist’ (a list of all molecules to be docked) in the working directory.
The script needs to be run twice; first to generate the scripts to run the screening, each script will contain 100 dockings to submit as 1 job.
The second time; lines 12 and 13 should be deleted, and line 15 (with the qsub in it) uncommented. Running the script again will submit the jobs.
Start with 'python run-jobs.py'
''Hershna''
import os
import time
from subprocess import Popen,PIPE
step=100
mypath=os.getcwd()
files=open('filelist').read().splitlines()
i=0
while i<len(files):
c=0
# test -- write the job scripts to a file but don't run them
q=Popen('cat > test'+str(i),shell=True,stdin=PIPE).stdin
# actual run
# q=Popen('qsub -N dock-'+str(i)+' -j oe -o /dev/null -l nodes=1:ppn=8 -l walltime=33:00:00',shell=True,stdin=PIPE).stdin
q.write('#!/bin/bash\n')
q.write('cd '+mypath+'\n')
while c<100 and i<len(files):
f=files[i]
b='dock_'+f
q.write('mkdir -p '+b+'\n')
q.write('/soft/autodock_vina_1_1_1_linux_x86/bin/vina --config '+mypath+'/conf.txt --ligand '+mypath+'/'+f+' --out '+mypath+'/'+b+'/out.pdbqt --log '+mypath+'/'+b+'/log.txt\n')
i+=1
c+=1
q.close()
time.sleep(2)
040b5138e3107707be78eaa32287d7c4445bc91a
Vina
0
23
474
473
2017-04-13T11:28:01Z
H.patel
14
wikitext
text/x-wiki
[http://vina.scripps.edu/ AutoDock Vina] is software for molecular docking.
It is very easy to use for virtual screening with shell scripts as explained in the manual. The molecules need to be in pdbqt format. This format can be easily generated with Raccoon (see [[Autodock]]).
Copy the pdbqt files, the protein-receptor and the vina configuration file (conf.txt in the example below) into the working directory. The following bash-script carries out the virtual screening. Start with 'nohup vinaScreen.bash > nohup.out &' (not qsub).
''Andreas''
<pre>#!/bin/bash
# start from the folder where the .pdbqt files are located
mypath=`pwd`
for f in ZINC*.pdbqt; do
b="dock_$f"
echo Processing ligand $b
mkdir -p $b
echo "#!/bin/bash" > job.sh
echo "cd $mypath" >> job.sh
echo /soft/autodock_vina_1_1_1_linux_x86/bin/vina --config $mypath/conf.txt --ligand $mypath/$f --out $mypath/${b}/out.pdbqt --log $mypath/${b}/log.txt >> job.sh
qsub -N $f -j oe -l cput=00:12:00 -l nodes=1:ppn=8 -l walltime=00:12:00 -l mem=512mb job.sh
sleep 30 # reduce this waiting time in seconds to speed up
done
</pre>
'''If screening over 500 molecules, the following python script must be used.'''
It requires a file called ‘filelist’ (a list of all molecules to be docked) in the working directory.
The script needs to be run twice; first to generate the scripts to run the screening, each script will contain 100 dockings to submit as 1 job.
The second time; lines 12 and 13 should be deleted, and line 15 (with the qsub in it) uncommented. Running the script again will submit the jobs.
Start with 'python run-jobs.py'
''Hershna''
<pre> import os
import time
from subprocess import Popen,PIPE
step=100
mypath=os.getcwd()
files=open('filelist').read().splitlines()
i=0
while i<len(files):
c=0
# test -- write the job scripts to a file but don't run them
q=Popen('cat > test'+str(i),shell=True,stdin=PIPE).stdin
# actual run
# q=Popen('qsub -N dock-'+str(i)+' -j oe -o /dev/null -l nodes=1:ppn=8 -l walltime=33:00:00',shell=True,stdin=PIPE).stdin
q.write('#!/bin/bash\n')
q.write('cd '+mypath+'\n')
while c<100 and i<len(files):
f=files[i]
b='dock_'+f
q.write('mkdir -p '+b+'\n')
q.write('/soft/autodock_vina_1_1_1_linux_x86/bin/vina --config '+mypath+'/conf.txt --ligand '+mypath+'/'+f+' --out '+mypath+'/'+b+'/out.pdbqt --log '+mypath+'/'+b+'/log.txt\n')
i+=1
c+=1
q.close()
time.sleep(2)
</pre>
94309f3b8aa3a6fa6667c297a728d2f1cf2de530
Autodock
0
22
475
373
2017-04-13T11:39:48Z
H.patel
14
wikitext
text/x-wiki
[http://autodock.scripps.edu/ AutoDock] is software for molecular docking.
It is best used together with the graphical user interface [http://mgltools.scripps.edu/downloads AutoDock Tools]. AutoDock Tools can be installed in your home directory e.g. 'akukol/bin/MGLtools'.
You need to download the file 'mgltools_x86_64Linux2_1.5.4.tar.gz'. The automatic installer does not work.
For virtual screening obtain the software [http://autodock.scripps.edu/resources/raccoon Raccoon]. This needs AutoDock tools to be installed first.
Raccoon automatically generates the file vs_submit.sh. This must be edited according to your special circumstances (see below) and then started with:
'nohup vs_submit.sh &' (do not use qsub)
''Andreas''
<pre>#!/bin/bash
#
# Generated with Raccoon | AutoDockVS
#
#### PBS jobs parametersCPUT="00:20:00"
WALLT="00:20:00" # << change here
#
# There should be no reason
# for changing the following values
NODES=1
PPN=1
MEM=512mb
### CUSTOM VARIABLES
#
# use the following line to set special options (e.g. specific queues)
#OPT="-q MyPriorQueue"
OPT="-j oe -N AutoDock" # join output and error, job name: Autodock
# Paths for executables on the cluster
# Modify them to specify custom executables to be used
QSUB="qsub" # << change here
AUTODOCK="/soft/autodock/autodock4" # << change here
# Special path to move into before running
# the screening. This is very system-specific,
# so unless you're know what are you doing,
# leave it as it is
WORKING_PATH=`pwd`
##################################################################################
##################################################################################
####### There should be no need to modify anything below this line ###############################
##################################################################################
##################################################################################
#
#
type $AUTODOCK &> /dev/null || {
echo -e "\nError: the file [$AUTODOCK] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the AutoDock binary in the script";
echo -e "( i.e. AUTODOCK=/usr/bin/autodock4 )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
type $QSUB &> /dev/null || {
echo -e "\nError: the file [$QSUB] doesn't exist or is not executable\n";
echo -e "Try to specify the full path to the executable of the Qsub command binary in the script";
echo -e "( i.e. QSUB=/usr/bin/qsub )\n\n";
echo -e " [ virtuals screening submission aborted]\n"
exit 1; }
echo Starting submission...
for NAME in `cat jobs_list`
do
cd $NAME
echo "#!/bin/bash" > $NAME.job
echo "cd $WORKING_PATH/$NAME" >> $NAME.job
echo "$AUTODOCK -p $NAME.dpf -l $NAME.dlg" >> $NAME.job
chmod +x $NAME.job
echo -n "Submitting $NAME : "
$QSUB $OPT -l cput=$CPUT -l nodes=1:ppn=1 -l walltime=$WALLT -l mem=$MEM $NAME.job
sleep 23 # << add this line to avoid flooding the cluster with 1000nds of jobs
cd ..
done
</pre>
The wait time of 23 seconds may be reduced in order to speed up the calculation.
'''If screening over 500 molecules, the following python script must be used.'''
It requires a file called ‘JobsList’ (a list of all molecules to be docked) in the working directory. This will be automatically generated by Raccoon.
The script needs to be run twice; first to generate the scripts to run the screening, each script will contain 100 dockings to submit as 1 job.
The second time; lines 12 and 13 should be deleted, and line 15 (with the qsub in it) uncommented. Running the script again will submit the jobs. Start with 'python run-jobsAD4.py'
''Hershna''
<pre>import os
import time
from subprocess import Popen,PIPE
step=100
mypath=os.getcwd()
files=open('JobsList').read().splitlines()
i=0
while i<len(files):
c=0
# test -- write the job scripts to a file but don't run them
q=Popen('cat > test'+str(i),shell=True,stdin=PIPE).stdin
# actual run
# q=Popen('qsub -N AD4-'+str(i)+' -j oe -o /dev/null -l nodes=1:ppn=1 -l walltime=33:00:00',shell=True,stdin=PIPE).stdin
q.write('#!/bin/bash\n')
while c<100 and i<len(files):
f=files[i]
q.write('cd '+mypath+'/'+f+'\n')
q.write('/soft/autodock/autodock4 -p '+f+'.dpf -l '+f+'.dlg \n')
i+=1
c+=1
q.close()
time.sleep(2)
</pre>
48ca91084a0d0224a0390f0dbc1d34112071c9d0
Cluster bibliography
0
30
476
467
2017-06-29T09:02:04Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''), Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011''') Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A, Consensus virtual screening approaches to predict protein ligands ('''2011'''), ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
b1084dd6d4f09af3eb8893022d519f4138b8b871
Gromacs
0
19
477
470
2017-09-28T08:55:37Z
Akukol
3
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use Gromacs version 2016.
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Look here for [[groperform|optimising performance]]. The current version is built with GPU support which offers a significant speed up of simulation time. Since Gromacs 2016.4 there is no need to distinguish between the GPU and non-GPU ('mpi') version.
Note that all GPUs attached to the node are used automatically. The maximum walltime is 48 hours.
''Andreas/Hershna''
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q gpu
#PBS -l nodes=1:ppn=16
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -k oe
#PBS -u hpatel
# runs a job with name 'GromacsTest' on the gpu machine on the cluster
# uses 1 GPU
# set a maximum time of forty eight hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'hpatel'
# set required paths:
source /soft/gromacs-2016.4/bin/GMXRC
export LD_LIBRARY_PATH="soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
# specify working directory:
cd /home/hpatel/gromacsGPU
### This is the command ###
gmx mdrun -s run.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
aa2a8e15505e651c9342ca272aae5f620b95ff68
Accounts
0
3
478
453
2017-10-16T14:30:57Z
Mjh
2
wikitext
text/x-wiki
To get an account, speak to Martin Hardcastle in 2E71 (Innovation Centre).
Accounts are available to the following classes of people:
* Members of the Centre for Astrophysics Research (CAR)
* Members of the Centre for Atmospheric & Instrumentation Research (CAIR)
* Other research-active members of the School of Physics, Astronomy and Mathematics (PAM)
* Members of the School of Computer Science (CS)
* Others, by special arrangement; restricted to those who have made a financial contribution to the cluster.
Access is granted subject to observance of our usage [[policies]]. Accounts may be suspended or cancelled if the cluster is abused: see also our [[Account cancellation policy]].
bc61e4cb28663730cc21335e535e8e4ff0184498
479
478
2017-10-16T14:32:43Z
Mjh
2
wikitext
text/x-wiki
To get an account, speak to Martin Hardcastle in 2E71 (Innovation Centre).
Accounts are available to members of the Schools of PAM, Engineering and Computer Science, and to others by special arrangement.
Access is granted subject to observance of our usage [[policies]]. Accounts may be suspended or cancelled if the cluster is abused: see also our [[Account cancellation policy]].
1e433a7cf44548c1cb0a2992b929ab627bd3d48d
Architecture
0
7
480
462
2017-11-03T09:27:04Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 GB RAM for user logins and development
* compute nodes (or just 'nodes'), as follows:
** 52 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-052)
** 16 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (node064-79: chassis 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (node080-92: chassis 6)
** 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node111: chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node112-127: chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node128-140: chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-3).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 GB RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A GPU machine, gpu1
* A [[Tesla|Tesla K80]] unit attached to gpu1
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades installed in enclosures (chassis) of 16 blades each: there are 9 chassis in total. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
dd72cf123d0a0781f5023c0dee635377daf09899
493
480
2018-01-23T13:53:27Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* a head node, which is an 2 socket x 4-core x 2 Hyperthreads Xeon-based machine with 32 GB RAM for user logins and development
* compute nodes (or just 'nodes'), as follows:
** 52 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-052)
** 16 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (node064-79: chassis 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (node080-92: chassis 6)
** 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node111: chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node112-127: chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node128-140: chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-3).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 GB RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A GPU machine, gpu1
* A [[Tesla|Tesla K80]] unit attached to gpu1
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
c6d1eab2ce83ea2894b6a946912c411262c3a73c
497
493
2018-02-03T09:26:29Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, stri-cluster, with 2 4-core Xeons and 32 GB RAM: old login node, now deprecated
* compute nodes (or just 'nodes'), as follows:
** 52 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-052)
** 16 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (node064-79: chassis 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (node080-92: chassis 6)
** 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node111: chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node112-127: chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node128-140: chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9)
* Three [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and one with 32 (4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-3).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 GB RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A GPU machine, gpu1
* A [[Tesla|Tesla K80]] unit attached to gpu1
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
a932d60addd1f987f6e08d2d4228fba25c011bf7
503
497
2018-03-10T16:29:26Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, stri-cluster, with 2 4-core Xeons and 32 GB RAM: old login node, now deprecated
* compute nodes (or just 'nodes'), as follows:
** 64 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-064)
** 16 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (node065-80: chassis 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (node081-92: chassis 6)
** 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9)
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 GB RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A [[Tesla|Tesla S2050]] CUDA unit attached to smp1.
* A GPU machine, gpu1
* A [[Tesla|Tesla K80]] unit attached to gpu1
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
12f8cad95e7a9b583f7daf94133894bb46c1b536
511
503
2018-03-10T18:59:53Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, stri-cluster, with 2 4-core Xeons and 32 GB RAM: old login node, now deprecated
* compute nodes (or just 'nodes'), as follows:
** 64 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-064)
** 16 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (node065-80: chassis 5)
** 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (node081-92: chassis 6)
** 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6)
** 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7)
** 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8)
** 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9)
** 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9)
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM
** 100 TB of [[storage]] attached to the SMP machines
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], which is a 2 socket x 8 core Xeon-based machine with 32 GB RAM ('''CAIR use only''').
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A Tesla S2050 [[GPUs|GPU]] unit attached to smp1.
* A [[GPUs|GPU] machine, gpu1
* A Tesla K80 [[GPUs|GPU]] unit attached to gpu1
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* Ethernet and infiniband switches to provide connectivity.
The compute nodes are Dell blades. Nodes within a given chassis communicate via switches internal to the chassis; external switches connect the chassis together. See [[networking]] for more details.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
653013f42611725dd723c0d918cad2340531fd29
517
511
2018-03-10T21:03:08Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, stri-cluster, with 2 4-core Xeons and 32 GB RAM: old login node, now deprecated
== compute nodes ==
* 64 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-064), in the main queue
* 16 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (node065-80: chassis 5), in the cair_l queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (node081-92: chassis 6), in the cair_l and cair_s queues
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-9, file servers providing the BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
aa8fc772ffe760839a841effe8ce5dede712f2f6
518
517
2018-03-10T21:03:25Z
Mjh
2
/* compute nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, stri-cluster, with 2 4-core Xeons and 32 GB RAM: old login node, now deprecated
== compute nodes ==
* 64 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-064), in the main queue
* 16 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (node065-80: chassis 5), in the cair_l queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (node081-92: chassis 6), in the cair_l and cair_s queues
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-9, file servers providing the BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
5af925dede525272340fc93fa4a5d5a8a66b7a65
Todo
0
56
481
395
2017-11-13T21:55:51Z
Mjh
2
wikitext
text/x-wiki
2017 To do:
* Decide division of jobs between new head nodes
* OS on new head nodes
* Copy /home and /soft
*Head node services (from old to-do list)
**install new CA and generate certs;
**install slapd and import ldif.
**setup denyhosts;
**setup dnsmasq
**setup exim;
**setup mysql or equivalent
**setup httpd and copy existng web hierarchy;
**setup wiki (in mysql??)
**setup munin;
**setup ganglia;
**setup torque (copy from old including spool dirs for jobs);
**setup maui (copy from old);
**setup iptables;
**setup routing;
**setup nfs shares;
**setup ntp;
**copy /etc/fstab entries;
**copy /etc/rc.d/rc.local (license managers etc)
**copy existng system cron jobs (home backup)
**copy /root (re: scripts )
* Full SL upgrade on all compute and server nodes & reboot
* Improve infiniband topology
* BeeGFS:
** back up
** move management and metadata servers
** servers to top-level Infiniband switch
* Upgrade Torque?
* Copy /stri-data and /cair-data to new Beegfs volumes (can't be done till after management/metadata moves)
00fbeaf135455c83d388a81346fc88d3db108b07
482
481
2017-11-13T21:59:27Z
Mjh
2
wikitext
text/x-wiki
2017 To do:
* Decide division of jobs between new head nodes
* OS on new head nodes
* Copy /home and /soft
*Head node services (from old to-do list)
**install new CA and generate certs;
**install slapd and import ldif.
**setup denyhosts;
**setup dnsmasq
**setup exim;
**setup mysql or equivalent
**setup httpd and copy existng web hierarchy;
**setup wiki (in mysql??)
**setup munin;
**setup ganglia;
**setup torque (copy from old including spool dirs for jobs);
**setup maui (copy from old);
**setup iptables;
**setup routing;
**setup nfs shares;
**setup ntp;
**copy /etc/fstab entries;
**copy /etc/rc.d/rc.local (license managers etc)
**copy existng system cron jobs (home backup)
**copy /root (re: scripts )
* Full SL upgrade on all compute and server nodes & reboot
* Improve infiniband topology
* BeeGFS:
** back up
** move management and metadata servers
** servers to top-level Infiniband switch
* Upgrade Torque?
* Copy /stri-data and /cair-data to new Beegfs volumes (can't be done till after management/metadata moves)
* Physical removal of old head node, old storage volumes
2d6dae8a57121543fa15f9b4f6caabb000a54fa0
488
482
2017-12-11T10:14:34Z
Mjh
2
wikitext
text/x-wiki
2017 To do:
* Decide division of jobs between new head nodes: done
* OS on new head nodes: done
* Copy /home and /soft
*Head node services (from old to-do list)
**install new CA and generate certs;
**install slapd and import ldif.
**setup denyhosts;
**setup dnsmasq
**setup exim;
**setup mysql or equivalent
**setup httpd and copy existng web hierarchy;
**setup wiki (in mysql??)
**setup munin;
**setup ganglia;
**setup torque (copy from old including spool dirs for jobs);
**setup maui (copy from old);
**setup iptables;
**setup routing;
**setup nfs shares;
**setup ntp;
**copy /etc/fstab entries;
**copy /etc/rc.d/rc.local (license managers etc)
**copy existng system cron jobs (home backup)
**copy /root (re: scripts )
* Full SL upgrade on all compute and server nodes & reboot: all upgraded and most rebooted, need to reboot chassis7/8
* Improve infiniband topology
* BeeGFS:
** back up: done
** move management and metadata servers: done
** servers to top-level Infiniband switch
* Upgrade Torque?
* Copy /stri-data and /cair-data to new Beegfs volumes (can't be done till after management/metadata moves)
* Physical removal of old head node, old storage volumes
99e1a5561921d9c638c15a36d87fc91385f0e968
489
488
2017-12-11T19:26:55Z
Mjh
2
wikitext
text/x-wiki
2017 To do:
* Decide division of jobs between new head nodes: done
* OS on new head nodes: done
* Copy /home and /soft: done
*Head node services (from old to-do list)
**install new CA and generate certs;
**install slapd and import ldif.
**setup denyhosts;
**setup dnsmasq
**setup exim;
**setup mysql or equivalent
**setup httpd and copy existng web hierarchy;
**setup wiki (in mysql??)
**setup munin;
**setup ganglia;
**setup torque (copy from old including spool dirs for jobs);
**setup maui (copy from old);
**setup iptables;
**setup routing;
**setup nfs shares;
**setup ntp;
**copy /etc/fstab entries;
**copy /etc/rc.d/rc.local (license managers etc)
**copy existng system cron jobs (home backup)
**copy /root (re: scripts )
* Full SL upgrade on all compute and server nodes & reboot: all upgraded and most rebooted, need to reboot chassis7/8
* Improve infiniband topology
* BeeGFS:
** back up: done
** move management and metadata servers: done
** servers to top-level Infiniband switch
* Upgrade Torque?
* Copy /stri-data and /cair-data to new Beegfs volumes (can't be done till after management/metadata moves)
* Physical removal of old head node, old storage volumes
d1778c68cccaac1bf816461f4efe792d7e8f47ee
Administrators
0
6
483
464
2017-11-17T13:04:50Z
Mjh
2
wikitext
text/x-wiki
== Administrators ==
These are currently:
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk (x3409, room 2E71 (Innovation Centre)).
Contact us with queries.
44925cea29e244d6743937eac5e0f73184dbb2f0
Known problems
0
25
484
412
2017-11-30T11:56:36Z
Mjh
2
/* Known problems */
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable.
* The scheduler sometimes crashes for unknown reasons causing jobs not to run. (Regularly run scripts check and restart the scheduler.)
* The scheduler very occasionally will not run a job that could be run immediately in free resources. If <tt>checkjob</tt> shows 'job can run' and <tt>showstart</tt> shows an immediate start, but your job does not run, then please contact the [[administrators]].
* Node specifications of the form <tt>nodes=main:ppn=16</tt> or <tt>nodes=smp:ppn=1</tt> will severely confuse the scheduler, although they are valid. Please do not use queue names in node specifications: always do something like <tt>-q main -l nodes=1:ppn=16</tt> instead.
== Node hardware/sw (for admin use only) ==
* node001 -- thermal issues at high load
* node069 -- hardware failure (replace with spare)
* node110-111 -- offline for beegfs testing
* node112 -- infiniband not recognised
04d742aa76b444bce8a4f9f33902c67138d6da9f
Storage
0
8
485
376
2017-12-06T11:00:49Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 1 Tb of user home directories, mounted as /home
* Software directory /soft
* 61 Tb of scratch available to all users, mounted as /stri-data
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 59 Tb of scratch for CAIR users only, mounted as /cair-data
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 309 TB of beegfs storage nominally distributed as follows:
* 90 TB: general use, under /beegfs/general
* 146 TB: CAR, under /beegfs/car
* 73 TB: LOFAR-UK, under /beegfs/local
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
f4d6beb23064e064c75fac1df0c3e830131aba29
491
485
2018-01-23T13:51:39Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 1 Tb of user home directories, mounted as /home
* Software directory /soft
* 61 Tb of scratch available to all users, mounted as /stri-data
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 59 Tb of scratch for CAIR users only, mounted as /cair-data
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 399 TB of beegfs storage nominally distributed as follows:
* 180 TB: general use, under /beegfs/general
* 146 TB: CAR, under /beegfs/car
* 73 TB: LOFAR-UK, under /beegfs/local
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
b3747f69cfaebbb74edd2b0dd3e879678386e803
492
491
2018-01-23T13:51:58Z
Mjh
2
/* System-wide NFS storage */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 1 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 59 Tb of scratch for CAIR users only, mounted as /cair-data
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 399 TB of beegfs storage nominally distributed as follows:
* 180 TB: general use, under /beegfs/general
* 146 TB: CAR, under /beegfs/car
* 73 TB: LOFAR-UK, under /beegfs/local
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
3951888e1cb60e65062c479be34dbdf9ea17fae3
498
492
2018-02-03T09:27:10Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 1 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 59 Tb of scratch for CAIR users only, mounted as /cair-data
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 470 TB of beegfs storage nominally distributed as follows:
* 180 TB: general use, under /beegfs/general
* 146 TB: CAR, under /beegfs/car
* 144 TB: LOFAR-UK, under /beegfs/lofar
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
587253491a7fa71c3ac1618e7ea69e4ae3df65d7
515
498
2018-03-10T20:49:22Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 1 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 560 TB of beegfs storage nominally distributed as follows:
* 180 TB: general use, under /beegfs/general
* 146 TB: CAR, under /beegfs/car
* 144 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
bb898b4f7715ecaa780034c5200819e69ce6d08a
Ramdisks
0
54
486
378
2017-12-06T11:04:56Z
Mjh
2
wikitext
text/x-wiki
All nodes have a 16-Gb ramdisk set up by default.
The normal expectation is that you will use the central [[storage]] for file operations, including data processing. However, there may be circumstances where it will be useful to do local, fast file operations.
If you want to do this, to avoid interfering with other jobs:
* You ''must'' reserve the maximum amount of space that your job will use using the <tt>pmem</tt> option to <tt>qsub</tt>; e.g.
<pre>qsub -l nodes=1,pmem=10gb</pre>
* You must create a directory in /dev/shm in which your job will work as part of your <tt>qsub</tt> script, which will be unique to your job. For example, you might want to do
<pre>
mkdir /dev/shm/$PBS_JOBID
cd /dev/shm/$PBS_JOBID
</pre>
* You must only work in this directory, and the total filespace you use must not exceed the reserved amount.
* When your job is finished it must before exiting clear up the filespace used; no files must be left in <tt>/dev/shm</tt>.
Note that /dev/shm is by nature volatile. When a machine is rebooted, the contents of /dev/shm will be irretrievably lost.
If you want larger, non-volatile local storage, see [[local disk space]].
31ca04630864c03b613f085808a60cdcc5bb6657
487
486
2017-12-06T11:05:31Z
Mjh
2
wikitext
text/x-wiki
All nodes have RAM-backed storage by default (provided by the OS).
The normal expectation is that you will use the central [[storage]] for file operations, including data processing. However, there may be circumstances where it will be useful to do local, fast file operations.
If you want to do this, to avoid interfering with other jobs:
* You ''must'' reserve the maximum amount of space that your job will use using the <tt>pmem</tt> option to <tt>qsub</tt>; e.g.
<pre>qsub -l nodes=1,pmem=10gb</pre>
* You must create a directory in /dev/shm in which your job will work as part of your <tt>qsub</tt> script, which will be unique to your job. For example, you might want to do
<pre>
mkdir /dev/shm/$PBS_JOBID
cd /dev/shm/$PBS_JOBID
</pre>
* You must only work in this directory, and the total filespace you use must not exceed the reserved amount.
* When your job is finished it must before exiting clear up the filespace used; no files must be left in <tt>/dev/shm</tt>.
Note that /dev/shm is by nature volatile. When a machine is rebooted, the contents of /dev/shm will be irretrievably lost.
If you want larger, non-volatile local storage, see [[local disk space]].
78e0c3302b6baf96755763772e26f1af02e28d35
Main Page
0
1
490
445
2018-01-23T13:48:35Z
Mjh
2
/* Welcome to the cluster documentation wiki */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the UH HPC service.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to achieve a particular result on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Known problems ==
* [[Known problems]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[Monitoring]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
38b2f76d28ee115cabffc1096c9474c39ef9d7b6
504
490
2018-03-10T18:39:40Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the UH HPC service.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to achieve a particular result on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Getting started ==
* [[Read this first]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[Tesla]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[Monitoring]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
== Known problems ==
* [[Known problems]]
2955f9395ee95cd823b70383f4e18d06e27e2cbc
510
504
2018-03-10T18:58:24Z
Mjh
2
/* Using the cluster */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the UH HPC service.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to achieve a particular result on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Getting started ==
* [[Read this first]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[GPUs]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[Monitoring]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
== Known problems ==
* [[Known problems]]
884444520a499741e5f95866af1b9ef464a3699a
Acknowledgements
0
29
494
454
2018-02-03T07:48:35Z
Mjh
2
wikitext
text/x-wiki
If possible, please say explicitly that you have used the STRI cluster in any paper you publish that makes use of results obtained using it.
We don't insist on any explicit form of words, but an example might be 'This work has made use of the University of Hertfordshire's high-performance computing facility.'
If you wish you can add a link to <tt>http://uhhpc.herts.ac.uk/</tt>.
Please also add details of any submitted, accepted or published paper using the cluster to the [[Cluster bibliography]] page.
ebbcc8dfa8c37e921d908d353b4d6522575db3da
495
494
2018-02-03T07:48:50Z
Mjh
2
wikitext
text/x-wiki
If possible, please say explicitly that you have used UH HPC in any paper you publish that makes use of results obtained using it.
We don't insist on any explicit form of words, but an example might be 'This work has made use of the University of Hertfordshire's high-performance computing facility.'
If you wish you can add a link to <tt>http://uhhpc.herts.ac.uk/</tt>.
Please also add details of any submitted, accepted or published paper using the cluster to the [[Cluster bibliography]] page.
7fe78d3f8bb0d6998f0b0191b21712ea8736ea9d
Access
0
5
496
338
2018-02-03T09:23:07Z
Mjh
2
wikitext
text/x-wiki
== Access ==
The [[architecture|head node]]s of the cluster are accessible by ssh to uhhpc.herts.ac.uk, once you have an [[accounts|account]] set up.
If you are working from a Unix desktop, you should be able to type <tt>ssh username@uhhpc.herts.ac.uk</tt>. If you are using Windows, you will need a Windows ssh client: we recommend PuTTY[http://www.chiark.greenend.org.uk/~sgtatham/putty/].
Unless specific authorization from the [[administrators]] is provided to the contrary, individual compute nodes must be accessed either through batch [[jobs]] or via [[interactive jobs]] run on the head node: see also the [[policies|policy]] relating to this.
9aeb56e510e15eb3c9a4f8acdf2fae17856146be
506
496
2018-03-10T18:50:43Z
Mjh
2
wikitext
text/x-wiki
== Access ==
The [[architecture|head node]]s of the cluster are accessible by ssh to uhhpc.herts.ac.uk, once you have an [[accounts|account]] set up.
If you are working from a Unix desktop, you should be able to type <tt>ssh username@uhhpc.herts.ac.uk</tt>. If you are using Windows, you will need a Windows ssh client: we recommend PuTTY[http://www.chiark.greenend.org.uk/~sgtatham/putty/].
Unless specific authorization from the [[administrators]] is provided to the contrary, individual compute nodes must be accessed either through batch [[jobs]] or via [[interactive jobs]] run on the head nodes: see also the [[policies|policy]] relating to this. You may not log in to compute nodes directly, or run code on the head nodes.
ac03f76c9dc50fd9245838592a3df3b3f506fd82
Interactive jobs
0
35
499
269
2018-02-03T09:27:55Z
Mjh
2
wikitext
text/x-wiki
Interactive jobs are jobs which give you an interactive session on one of the compute nodes. Importantly, accessing the compute nodes this way means that the job control system guarantees the resources that you have asked for. If you simply log into a compute node with ssh (which is in any case forbidden by the cluster access [[policies]]), another user's job may compete with what you are trying to do. Therefore you should, unless explicitly authorized otherwise, always use the interactive job facility to run interactively on the compute nodes.
== Running an interactive job ==
An interactive job is run using the <tt>qsub</tt> command as for normal [[jobs]]. However, the result of running an interactive job is that you are logged on to the target machine. For example,
<pre>
[user@headnode1 ~]$ qsub -l walltime=00:30:00 -I -q main
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@node047 ~]$
</pre>
In this example the user has requested a 30-minute session on any node in the main cluster (the default request is one processor on one node). A node is available to run the job and so the user finds herself at a shell prompt. She may now use the node interactively for 30 minutes. At the end of the 30 minutes (wall time, not CPU time!) the job will end and the session will be terminated. If the user logs out before then, the job will terminate early.
Note that, just as with ordinary jobs, you must honour the allocation you are given. If you ask for one CPU on one node, you must only use that. In the interactive shell, a number of environment variables beginning with PBS (try <tt>printenv | grep PBS</tt>) are provided to tell you the job environment in which you are running in case you have forgotten.
If your request for an interactive shell cannot be fulfilled (because there are insufficient nodes available in the queue that you have specified), the qsub command will wait until it can be.
== Advanced topics ==
=== Multiple CPUs ===
If you know you want a machine completely dedicated to you (e.g. because you plan to run multithreaded code interactively) then you must explicitly request that: e.g.,
<pre>
qsub -l walltime=24:00:00 -l nodes=1:ppn=48 -I -q smp
</pre>
will reserve all 48 cores of one of the [[SMP machines]] for you for a day.
=== Multiple nodes ===
In the unlikely situation in which you want to run interactively on several nodes at once, you will only be given one shell. You can use the <tt>pbsdsh</tt> command to run the same commands on each of the processors you have been allocated, or /usr/local/bin/mpiexec to run [[MPI]] jobs.
<pre>
qsub -l walltime=00:01:00 -l nodes=2:ppn=2 -I -q smp
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@smp2 ~]$ pbsdsh hostname
smp2
smp1
smp1
smp2
</pre>
=== Specific machines ===
It is possible to request a specific machine just as for normal non-interactive [[jobs]]:
<pre>
qsub -l walltime=00:01:00 -l nodes=smp2:ppn=48 -I -q smp
</pre>
Do this if you know that you need to use some particular feature of the machine, such as the [[SMP machines]]' scratch discs.
=== X forwarding ===
If you want to run jobs that use the X windows system, you may turn on X11 forwarding using the <tt>-X</tt> option. (This is like the <tt>-X</tt> option to <tt>ssh</tt>.)
=== Walltime requests ===
Please be considerate in your walltime requests. While you have an interactive job running, no other user can use the resources you have allocated. Therefore, you should not set an interactive job running for a week and walk away. Jobs that appear to be reserving resources without using them will be terminated. Obviously the best strategy is to select a walltime that roughly matches what you want and exit before the time is up.
1cda9f631819784d0552ea3b2ae40db3a74a28d5
LOFAR-UK Compute Facility
0
57
500
460
2018-02-03T09:28:36Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the UH HPC facility reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 256 GB RAM and 16 cores, plus some number (up to 12) of compute nodes mostly with 64 GB RAM and 16 cores. (Some machines have 192 or 256 GB RAM.) Reservations on the more powerful [[SMP machines]] can be made if necessary.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/data/lofar/</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive [[jobs]], as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[Herts LOFAR HBA pipeline]] is available.
A description of the [[generic pipeline]] is available.
4de38ece545c858c9d4891297022cf0b6886574b
Cair-cluster
0
40
501
275
2018-02-03T09:30:06Z
Mjh
2
wikitext
text/x-wiki
== Cair data processing server ==
There is now a dedicated file server for cair users. The hostname is <code>cair-cluster</code> which is accessible from the private data network and the UH student network (using the FQDN <code>cair-cluster.herts.ac.uk</code>). The server is a Dell R520 with two Intel Xeon E5-2450L 1.80GHz processors and 32 GB RAM. It is connected to the "cair" InfiniBand network (192.168.4.0) via a dual-port QDR HBA. The server has ~ 77 TB of directly attached (via fibre channel) storage which has been configured to a RAID6 specification and is mounted as /cair-storage (on all cair nodes and the head node)
This server can be used for post processing on large datasets. We have also enabled job submission on this server, so if preferred, cair users do not have to log on to <code>uhhpc</code> at all.
a110a8349250c00bb66b135cd9df2fe5b52a1aec
Policies
0
4
502
292
2018-03-10T16:26:33Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* Accounts are for use by the named user only. You must not allow anyone else to use your account.
* The [[architecture|head node]]s must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing or processing of results from the compute nodes.
* The only permitted method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly using ssh. If you need an interactive session on a compute node, e.g. for data processing, use the [[interactive jobs]] facility.
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* There is a [[fair share]] policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on a timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
* If you are one of the group of people with exclusive or privileged use of a group of nodes, you must use those nodes in preference to the nodes of the main cluster if possible. You should not run jobs in the 'main' queue unless no nodes in your reserved group are available. This applies to members of CAIR and CAR.
c317ecce84939ae09185440d3a5d66a85be0aa33
505
502
2018-03-10T18:49:29Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* Accounts are for use by the named user only. You must not allow anyone else to use your account.
* The [[architecture|head node]]s must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing or processing of results from the compute nodes.
* The only permitted method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly using ssh. If you need an interactive session on a compute node, e.g. for data processing, use the [[interactive jobs]] facility.
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* There is a [[fair share]] policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on a timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
* If you are one of the group of people with exclusive or privileged use of a group of nodes, you must use those nodes in preference to the nodes of the main cluster if those nodes can meet your requirements. You should not run jobs in the 'main' queue unless no nodes in your reserved group are available. This applies to members of CAIR and CAR.
7fbbbf4a6eca718b073f575b6cc3b6234ae4cb78
Jobs
0
9
507
447
2018-03-10T18:54:11Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://docs.adaptivecomputing.com/torque/help.htm Torque] (formerly known as PBS) and [http://docs.adaptivecomputing.com/maui/index.php Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/qsub.htm online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
* <tt>file</tt>: the [[local disk space]] to be used by your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=40 job.sh ## run for two hours on 40 CPUs (which may be spread over up to 40 physical nodes)
qsub -l walltime=1:0:0 -l nodes=16:ppn=8 job.sh ## run for one hour on 16 nodes with 8 CPUs per node
qsub -l walltime=0:1:0 -l nodes=2:chassis1:ppn=4 job2.sh ## run for one minute on 2 nodes from chassis1 with 4 CPUs per node
qsub -l walltime=0:5:0 -l nodes=1:chassis1:ppn=1+1:chassis2:ppn=1 job3.sh ## run for 5 minutes on one node from chassis1 and one from chassis2
qsub -l walltime=0:0:10 -l nodes=node001:ppn=8 job4.sh ## run for 10 seconds on all CPUs of node001
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. The exception might be if you want to guarantee that all inter-process communications take place within a chassis; see [[networking]]).
Note also that in the present configuration requests for fewer processors per node than are available may be consolidated onto fewer nodes. So the third request above, run on an empty cluster, muight actually run on one physical node with 8 CPUs. This should never be a problem unless you are specifically testing inter-node communication. For efficiency, you are recommended always to ask for and use the maximum number of processes per node for MPI or multi-threaded code -- that is, 32 processes for the main cluster. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qalter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/2-jobs/exportedBatchEnvVar.htm here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/pbsdsh.htm pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
01d909f28a66ec93781271e68664378c9dec1332
Queues
0
15
508
380
2018-03-10T18:56:32Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available for general use on the system:
* 'main' is the default queue: this submits to the 78 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 27 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'smp' submits to the [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on all queues is 1.
fe8a416e71570583c2fb442974799479ddf51f03
516
508
2018-03-10T20:50:51Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available for general use on the system:
* 'main' is the default queue: this submits to the 78 nodes of the main cluster. The maximum wall time on this queue is 1 week.
* 'gpu' submits to the GPU nodes. The maximum wall time on this queue is 48 hours.
* 'smp' submits to the [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'cair_s' submits to a CAIR node dedicated to small (few CPU) jobs. This queue is restricted to CAIR users.
* 'cair_l' submits to the 27 CAIR nodes with no maximum wall time. This queue is restricted to CAIR users.
* 'car' submits to the 32 dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
== Default wall times ==
The default wall time limitation for the 'cair_s' queue is also the maximum, i.e. 6 hours.
The default wall time for all the other queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on all queues is 1.
dcc0dbe5282e0a522b79d8c290f728c4477b0448
Read this first
0
70
509
2018-03-10T18:57:48Z
Mjh
2
Created page with "= Introduction to cluster computing = If you are new to the concept of cluster computing, read this '''before doing anything else'''. Cluster computing is fundamentally diff..."
wikitext
text/x-wiki
= Introduction to cluster computing =
If you are new to the concept of cluster computing, read this '''before doing anything else'''.
Cluster computing is fundamentally different from personal computing. If you do not understand how it works, you may cause problems for other users, or break [[Policies|cluster rules]].
The cluster is composed of 'nodes' which are individual computers, joined together by a network. Nodes have different roles. Specifically there are a few 'login nodes' or 'head nodes' (what you see when you log in to the cluster), and many (about 150) [[Architecture|compute nodes]], which are where the actual computing work is done. You do not run code on the login nodes. Rather, you write a script to run a '[[Jobs|job]]' on the compute nodes. Your script tells the cluster both what you want to do, and what resources you need to do it. Your job is then run on one or more compute nodes that match your requirements.
If your code is capable of doing so, this model allows you to access many nodes simultaneously, and so to get much greater resources than your desktop or laptop computer can provide. But you need to understand it to use it.
New users should read '''at least''' the following Wiki pages:
* [[Accounts]] -- to find out how to get an account
* [[Access]] -- to find out how to get access to the cluster
* [[Architecture]] -- to find out what nodes there are
* [[Jobs]] -- to find out how to run jobs on appropriate compute nodes
Please don't approach the [[administrators]] for help until you have read and understood these pages.
c2aae6699a4e8afbd2e56bf13f58067207540d2a
514
509
2018-03-10T20:47:12Z
Mjh
2
wikitext
text/x-wiki
= Introduction to cluster computing =
If you are new to the concept of cluster computing, read this '''before doing anything else'''.
Cluster computing is fundamentally different from personal computing. If you do not understand how it works, you may cause problems for other users, or break [[Policies|cluster rules]].
The cluster is composed of 'nodes' which are individual computers, joined together by a network. Nodes have different roles. Specifically there are a few 'login nodes' or 'head nodes' (what you see when you log in to the cluster), and many (about 150) [[Architecture|compute nodes]], which are where the actual computing work is done. You do not run code on the login nodes. Rather, you write a script to run a '[[Jobs|job]]' on the compute nodes. Your script tells the cluster both what you want to do, and what resources you need to do it. Your job is then run on one or more compute nodes that match your requirements.
If your code is capable of doing so, this model allows you to access many nodes simultaneously, and so to get much greater resources than your desktop or laptop computer can provide. But you need to understand it to use it.
New users should read '''at least''' the following Wiki pages:
* [[Accounts]] -- to find out how to get an account
* [[Access]] -- to find out how to get access to the cluster
* [[Architecture]] -- to find out what nodes there are
* [[Jobs]] -- to find out how to run jobs on appropriate compute nodes
* [[Queues]] -- to understand which queue to use
* [[Storage]] -- to understand how and where to store data on the cluster
Please don't approach the [[administrators]] for help until you have read and understood these pages.
01f52222538956374a9276a4bf38b4fbc488c9c5
GPUs
0
71
512
2018-03-10T20:31:58Z
Mjh
2
Created page with "Several machines on the cluster have attached NVIDIA GPUs. * gpu1 is the main cluster gpu machine. It is currently the only machine in the <tt>gpu</tt> queue. The attached GP..."
wikitext
text/x-wiki
Several machines on the cluster have attached NVIDIA GPUs.
* gpu1 is the main cluster gpu machine. It is currently the only machine in the <tt>gpu</tt> queue. The attached GPUs are 6 Tesla K80 units.
* smp1 has a 4 attached Tesla S2050s. These are now very old and unlikely to be much use except for testing purposes.
* ramius has a single Tesla K40c. ramius is a private machine.
The NVIDIA CUDA software is installed in '/soft/cuda-VERSION' where VERSION is the version number of CUDA.
9325cb64ae7c784c9fa7c6254574f62e1b4912bb
513
512
2018-03-10T20:34:42Z
Mjh
2
wikitext
text/x-wiki
Several machines on the cluster have attached NVIDIA GPUs.
* gpu1 is the main cluster gpu machine. It is currently the only machine in the <tt>gpu</tt> queue. The attached GPUs are 6 Tesla K80 units.
* smp1 has a 4 attached Tesla S2050s. These are now very old and unlikely to be much use except for testing purposes.
* ramius has a single Tesla K40c. ramius is a private machine.
The NVIDIA CUDA software is installed in '/soft/cuda-VERSION' where VERSION is the version number of CUDA.
Note:
* At the moment, there is no provision for preventing contention between users of the Tesla unit. That is, if more than one user runs a job that tries to use it at one time, there may be contention and we don't know what effect this will have.
* Equally, there is no provision for preventing contention on the host side -- if you only ask for a few CPU cores and another CPU-bound job takes the remainder, this may (or may not) interfere with your job.
The easiest way to overcome any issues of contention would be to ask for all cores of the host machine, even if you're not going to use them all: then the job submission system will not allow any other jobs to run on the host, so you have exclusive access to the GPU.
It may be sensible to bind your host-side process to cores physically in sockets that are connected to the PCI bus using the Linux process affinity setting commands. This will depend on your application --- left up to users at the moment.
The current arrangement in principle allows non-GPU jobs running on a machine with a GPU to block CUDA jobs. If this becomes a problem, we will review the queueing arrangements.
== Via OpenGL context ==
It is possible for an application to access an OpenGL context on the GPUs using the provided 'headless' X server configuration.
* User needs to start X server:
<pre>
X :42 &
</pre>
where "42" is a free display number, assuming "42" is free. If a number is chosen that is not free, an error message will appear.
* Set the DISPLAY environment variable:
<pre>
export DISPLAY=:42.0
</pre>
where "42" is the free display number as used for starting the X server, assuming the syntax of the BASH shell and .0 denotes the first screen (there are currently four - but using more than one is untested).
* start application, which will need to request an OpenGL context, make use of it, output results unattended and quit after it is done (a render job for example).
Submission of such jobs through the job queue is also untested, so needs to be discussed with the administrators first.
a687cf141aca053139a460abe5ba8263d64ab777
WEAVE
0
72
519
2018-03-15T16:25:17Z
Mjh
2
Created page with "Access to the UH HPC facility is available to members of the WEAVE consortium under an agreement between WEAVE and the University of Hertfordshire. == Obtaining access == If..."
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE consortium under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE consortium you may request access by sending e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow.
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
[Dan to add]
8873efadaf96a5191c7cae21e07becd1eb027ccd
520
519
2018-03-15T16:26:50Z
Mjh
2
/* Types of usage */
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE consortium under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE consortium you may request access by sending e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
[Dan to add]
1cd281b36b3d7e0823bcbddad38bebdce9540021
AIPS
0
27
521
267
2018-03-15T17:05:23Z
Mjh
2
wikitext
text/x-wiki
AIPS is software for radio astronomy data reduction. It is installed on the cluster in /soft/aips .
To use aips you will need to be in the aipsuser group.
From the head node, use an [[interactive jobs|interactive job]] to get a session on the machine you want to use. You can use any compute node or the [[SMP machines]]. Be sure to use the -X option to get X11 forwarding. Then do <tt>/soft/aips/START_AIPS tv=local da=STRI_CLUSTER tpok</tt>.
You may choose your own AIPS number; if you clash with someone else, you'll probably notice.
== Parseltongue ==
To use Parseltongue for scripting AIPS do
<pre>setenv PYTHONPATH /soft/Obit/python
source /soft/AIPS/LOGIN.CSH
/soft/parseltongue/bin/ParselTongue
</pre>
bae77eab55d1f0d3ed6b1d948fe253c9f59f903c
User:Dsmith
2
73
522
2018-03-16T13:37:23Z
Mjh
2
Creating user page for new user.
wikitext
text/x-wiki
My name is Dan Smith and I am an astronomer. I own four combs of different sizes, and never use any of them. Is that fifty words yet? 1+1 = 2, 6+2 = 8.
My name is Dan Smith and I am an astronomer. I own four combs of different sizes, and never use any of them. Is that fifty words yet? 1+1 = 2, 6+2 = 8.
e8cf40d0f60667db9a862dd5123a129c3aa73b73
User talk:Dsmith
3
74
523
2018-03-16T13:37:24Z
Mjh
2
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents help pages].
Again, welcome and have fun! [[User:Mjh|Mjh]] ([[User talk:Mjh|talk]]) 13:37, 16 March 2018 (UTC)
d9421985af77a839f9de6afe0ee93fdb9763aae2
User:Ptaylor
2
75
524
2018-03-16T13:37:48Z
Mjh
2
Creating user page for new user.
wikitext
text/x-wiki
Based at ANU in Canberra working with Christoph Federrath, previously at UH working with Chiaki Kobayashi.
Understanding galaxy evolution using cosmological simulations, with a focus on the influence of AGN feedback.
I am also constantly working to improve numerical modelling of AGN feedback by incorporating results from small-scale simulations of BH-driven jets.
3b38a1ec75946c3c4c664aabd3df15eb178d8f41
User talk:Ptaylor
3
76
525
2018-03-16T13:37:49Z
Mjh
2
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents help pages].
Again, welcome and have fun! [[User:Mjh|Mjh]] ([[User talk:Mjh|talk]]) 13:37, 16 March 2018 (UTC)
d9421985af77a839f9de6afe0ee93fdb9763aae2
WEAVE
0
72
526
520
2018-03-16T14:02:10Z
Dsmith
15
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE consortium you may request access by sending e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at PATH TBD. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, for a standard field containing 2,000 targets might look something like this:
How to put an example .csh file here?
For details of how to submit a job, look at the [[jobs]] page.
66adac3add8f3e98b77fa7c7730b645434050f70
527
526
2018-03-16T14:27:57Z
Dsmith
15
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE SWG you may request access by sending e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at PATH TBD. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, for a standard field containing 2,000 targets might look something like this:
<pre>
#!/bin/csh
#PBS -N configure_example
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=00:30:00
cd /path/to/your/xml/input/file
configure --gui 0 --field test_field.xml --output test_field_configured.xml
END
</pre>
For details of how to submit a job, and what the other commands in this example mean, take a look at the [[jobs]] page.
fd32c1d8c5d452d3292f881acee92726ba8f39b5
528
527
2018-03-16T17:28:14Z
Dsmith
15
/* Running configure */
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE SWG you may request access by sending e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at PATH TBD. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, with 8GB of memory for a standard field containing 2,000 targets might look something like this:
<pre>
#!/bin/csh
#PBS -N configure_example
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=00:30:00
#PBS -l pmem=8gb
cd /path/to/your/xml/input/file
configure --gui 0 --field test_field.xml --output test_field_configured.xml
END
</pre>
To help you estimate the resources that you're likely to need, and how this might vary dependent on the degree of clustering in your target data, take a look at
[https://star.herts.ac.uk/~dsmith/configure_resources_used.jpg?dl=0 these plots]
For details of how to submit a job, and what the other commands in this example mean, take a look at the [[jobs]] page.
19563af7f040fd0ab87b5c927edc78259303053c
529
528
2018-03-16T21:17:37Z
Mjh
2
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE SWG you may request access by sending e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>). Please specify the username you would like to use (which will be allocated to you if not already in use).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at PATH TBD. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, with 8GB of memory for a standard field containing 2,000 targets might look something like this:
<pre>
#!/bin/csh
#PBS -N configure_example
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=00:30:00
#PBS -l pmem=8gb
cd /path/to/your/xml/input/file
configure --gui 0 --field test_field.xml --output test_field_configured.xml
</pre>
To help you estimate the resources that you're likely to need, and how this might vary dependent on the degree of clustering in your target data, take a look at
[https://star.herts.ac.uk/~dsmith/configure_resources_used.jpg?dl=0 these plots]
For details of how to submit a job, and what the other commands in this example mean, take a look at the [[jobs]] page.
2fa87fb8906b0fc12c8d67bb982813ac7b7cbaf4
530
529
2018-03-17T12:17:19Z
Dsmith
15
/* Running configure */
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE SWG you may request access by sending e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>). Please specify the username you would like to use (which will be allocated to you if not already in use).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at PATH TBD. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, with 8GB of memory for a standard field containing 2,000 targets might look something like this:
<pre>
#!/bin/csh
#PBS -N configure_example
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=00:30:00
#PBS -l pmem=8gb
cd /path/to/your/xml/input/file
configure --gui 0 --threads 8 --field test_field.xml --output test_field_configured.xml
</pre>
To help you estimate the resources that you're likely to need, and how this might vary dependent on the degree of clustering in your target data, take a look at
[https://star.herts.ac.uk/~dsmith/configure_resources_used.jpg?dl=0 these plots]
For details of how to submit a job, and what the other commands in this example mean, take a look at the [[jobs]] page.
4b523432a748ca3188a9addf1a3cce652d123e0d
531
530
2018-03-21T11:46:19Z
Dsmith
15
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE SWG you may request access by sending an e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>). Please specify the username you would like to use (which will be allocated to you if not already in use).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at PATH TBD. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, with 8GB of memory for a standard field containing 2,000 targets might look something like this:
<pre>
#!/bin/csh
#PBS -N configure_example
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=00:30:00
#PBS -l pmem=8gb
cd /path/to/your/xml/input/file
configure --gui 0 --threads 8 --field test_field.xml --output test_field_configured.xml
</pre>
To help you estimate the resources that you're likely to need, and how this might vary depending on the degree of clustering in your target data, take a look at
[https://star.herts.ac.uk/~dsmith/configure_resources_used.jpg?dl=0 these plots]
For details of how to submit a job, and what the other commands in this example mean, take a look at the [[jobs]] page.
5b6cc323031b152f8aa36b6194a3603e03798ba3
533
531
2018-03-22T12:23:21Z
Dsmith
15
/* Running configure */
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE SWG you may request access by sending an e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>). Please specify the username you would like to use (which will be allocated to you if not already in use).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at /soft/configure/configure. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, with 8GB of memory for a standard field containing 2,000 targets might look something like this:
<pre>
#!/bin/csh
#PBS -N configure_example
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=00:30:00
#PBS -l pmem=8gb
cd /path/to/your/xml/input/file
/soft/configure/configure --gui 0 --threads 8 --field test_field.xml --output test_field_configured.xml
</pre>
To help you estimate the resources that you're likely to need, and how this might vary depending on the degree of clustering in your target data, take a look at
[https://star.herts.ac.uk/~dsmith/configure_resources_used.jpg?dl=0 these plots]
For details of how to submit a job, and what the other commands in this example mean, take a look at the [[jobs]] page.
847fd5493dd3e810f1aa86d6f1636ad59d80f35a
534
533
2018-03-22T15:25:24Z
Dsmith
15
/* Running configure */
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE SWG you may request access by sending an e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>). Please specify the username you would like to use (which will be allocated to you if not already in use).
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at /soft/configure/configure. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, with 8GB of memory (using the pmem command, which specifies the memory allocated per core) for a standard field containing 2,000 targets might look something like this:
<pre>
#!/bin/csh
#PBS -N configure_example
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=00:30:00
#PBS -l pmem=1gb
cd /path/to/your/xml/input/file
/soft/configure/configure --gui 0 --threads 8 --field test_field.xml --output test_field_configured.xml
</pre>
To help you estimate the resources that you're likely to need, and how this might vary depending on the degree of clustering in your target data, take a look at
[https://star.herts.ac.uk/~dsmith/configure_resources_used.jpg?dl=0 these plots]
For details of how to submit a job, and what the other commands in this example mean, take a look at the [[jobs]] page.
429b4585a7f312c8ed5911d7dee7d6d368fd738c
556
534
2018-06-20T19:48:28Z
Mjh
2
/* Obtaining access */
wikitext
text/x-wiki
Access to the UH HPC facility is available to members of the WEAVE Survey Working Group (SWG) under an agreement between WEAVE and the University of Hertfordshire.
== Obtaining access ==
If you are a member of the WEAVE SWG you may request access by sending an e-mail to Martin Hardcastle (<tt>m.j.hardcastle@herts.ac.uk</tt>). Please specify the username you would like to use (which will be allocated to you if not already in use). It will speed the process up if you could also confirm in the e-mail that you accept the [[Terms of use]] of the facility.
== Terms of access ==
The access provided is on the basis that you will be using the facility only for work directly related to WEAVE SWG activities. It does not allow you to use the facility as a general HPC facility for other purposes. General access is only available through a UH-based collaborator.
== Types of usage ==
By default you will be using the facility on the same terms as all other users. That means you can log on and submit jobs as described on the [[Main Page]] which will be processed as resources allow. Depending on your needs, you may want to use [[jobs|batch jobs]] or [[interactive jobs]].
If you have a particular task that you need to do where you know in advance that you will need a lot of CPU time (e.g. in the build up to a deadline for OpR3b) then you can ask for a [[reservations|reservation]] to be made for you or for the WEAVE group in general. So, for example, you could ask for two 32-core nodes to be reserved for a month and you would then have access to those nodes through the [[jobs|job control system]]. Reservations are time-limited and the start and end period of the reservation need to be agreed in advance.
== Running configure ==
The latest version of configure is installed at /soft/configure/configure. Do not run configure on the login nodes. A basic script to run configure using 8 cores on a single compute node, with 8GB of memory (using the pmem command, which specifies the memory allocated per core) for a standard field containing 2,000 targets might look something like this:
<pre>
#!/bin/csh
#PBS -N configure_example
#PBS -q main
#PBS -l nodes=1:ppn=8
#PBS -l walltime=00:30:00
#PBS -l pmem=1gb
cd /path/to/your/xml/input/file
/soft/configure/configure --gui 0 --threads 8 --field test_field.xml --output test_field_configured.xml
</pre>
To help you estimate the resources that you're likely to need, and how this might vary depending on the degree of clustering in your target data, take a look at
[https://star.herts.ac.uk/~dsmith/configure_resources_used.jpg?dl=0 these plots]
For details of how to submit a job, and what the other commands in this example mean, take a look at the [[jobs]] page.
4c00c86e96bed9ccc2212dafcdabb234edb91976
AIPS
0
27
532
521
2018-03-22T11:50:57Z
Mjh
2
wikitext
text/x-wiki
AIPS is software for radio astronomy data reduction. It is installed on the cluster in /soft/aips .
To use aips you will need to be in the aipsuser group.
From the head node, use an [[interactive jobs|interactive job]] to get a session on the machine you want to use. You can use any compute node or the [[SMP machines]]. Be sure to use the -X option to get X11 forwarding. Then do <tt>/soft/aips/START_AIPS tv=local da=STRI_CLUSTER tpok</tt>.
You may choose your own AIPS number; if you clash with someone else, you'll probably notice.
== Parseltongue ==
To use Parseltongue for scripting AIPS do
<pre>setenv PYTHONPATH /soft/Obit/python
source /soft/aips/LOGIN.CSH
/soft/parseltongue/bin/ParselTongue
</pre>
844db4180b6832567975fc6a227bc02e8ba0030b
Storage
0
8
535
515
2018-03-29T13:54:02Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 1 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 560 TB of beegfs storage nominally distributed as follows:
* 180 TB: general use, under /beegfs/general
* 146 TB: CAR, under /beegfs/car
* 144 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
fd54262b7d2364d2d5069bc3a065893e250c3a18
541
535
2018-04-30T11:26:41Z
Mjh
2
/* System-wide NFS storage */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 560 TB of beegfs storage nominally distributed as follows:
* 180 TB: general use, under /beegfs/general
* 146 TB: CAR, under /beegfs/car
* 144 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
00c610a102b7693cfa625901bb11580f1e7404f6
542
541
2018-04-30T11:29:29Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 650 TB of beegfs storage nominally distributed as follows:
* 270 TB: general use, under /beegfs/general
* 146 TB: CAR, under /beegfs/car
* 144 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
1f99d48ac255f71ad460b7a1e316b80d031a7a47
554
542
2018-05-11T08:09:31Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 740 TB of beegfs storage nominally distributed as follows:
* 360 TB: general use, under /beegfs/general
* 145 TB: CAR, under /beegfs/car
* 145 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
99dd57eb012958b10fee1a5953665b26438391c9
565
554
2018-12-24T08:14:14Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 740 TB of beegfs storage nominally distributed as follows:
* 360 TB: general use, under /beegfs/general
* 145 TB: CAR, under /beegfs/car
* 231 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
bc34cf55d93f845c9fb3b7d5db815ff1141472eb
LOFAR-UK Compute Facility
0
57
536
500
2018-04-09T17:02:56Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the UH HPC facility reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 256 GB RAM and 16 cores, plus reservations on some number (up to 12) of compute nodes mostly with 64 GB RAM and 16 cores. (Some machines have 192 or 256 GB RAM.) Reservations on the more powerful [[SMP machines]] can be made if necessary or you can compete with other users on the main cluster.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated areas <tt>/data/lofar/</tt> or <tt>/beegfs/lofar</tt>.
Data processing should generally be carried out on the compute nodes. When you have a particular project to do, ask for a reservation of enough of the LOFAR nodes to allow you to do what you need to do. When you're finished, let us know so that the reservation can be released. (Local, i.e. Herts-based, users need to submit jobs with the option <tt>-W group_list=lofar</tt> to make use of the reservation. External LOFAR-UK users do not.)
All the standard cluster commands work on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive [[jobs]], as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[Herts LOFAR HBA pipeline]] is available.
A description of the [[generic pipeline]] is available.
b11f45f42e02f8f85f222f8128796903c9c776ef
Fair share
0
39
537
263
2018-04-29T08:33:51Z
Mjh
2
wikitext
text/x-wiki
There are various systems in place to try to make sure that everyone gets a fair share of the cluster (particularly the main cluster) and that a mixture of long and short, large and small jobs can be run.
Jobs are run based on a priority assigned by the scheduler. The priority takes account of a number of factors:
* Jobs that involve more CPUs intrinsically have higher priority. This is to stop single-CPU work pushing out large jobs.
* Jobs that have been in the queue for longer have higher priority, up to a maximum of 64 idle jobs per user (after that, the jobs will wait in the queue but will not accrue priority)
* Jobs from users who have not recently significantly used the cluster have priority over those from users who have. (This is called the fair-share mechanism: the 'fair share amount' is a weighted integral of usage over the last week, weighted towards more recent use.)
In addition, by default,
* no user can have more than 400 processors' worth of jobs running at once. This is intended to stop a single user monopolizing the cluster
* no user can have a processor-time product that exceeds 1 week x 128 nodes running at any given time. This is intended to stop large long jobs blocking shorter jobs.
These policies apply to all queues but are tailored to the main queue where the competition is highest. If they cause you problems, please contact the [[administrators]]. We are happy to review policies to try to get the fairest result for everyone, and we can relax the default requirements if you have a particular need for more resources.
b5940a5858c7b60f3bfe57a72297b95c3f2d4fda
553
537
2018-05-10T10:51:57Z
Mjh
2
wikitext
text/x-wiki
There are various systems in place to try to make sure that everyone gets a fair share of the cluster (particularly the main cluster) and that a mixture of long and short, large and small jobs can be run.
Jobs are run based on a priority assigned by the scheduler. The priority takes account of a number of factors:
* Jobs that involve more CPUs intrinsically have higher priority. This is to stop single-CPU work pushing out large jobs.
* Jobs that have been in the queue for longer have higher priority, up to a maximum of 64 idle jobs per user (after that, the jobs will wait in the queue but will not accrue priority)
* Jobs from users who have not recently significantly used the cluster have priority over those from users who have. (This is called the fair-share mechanism: the 'fair share amount' is a weighted integral of usage over the last week, weighted towards more recent use.)
In addition, by default,
* no user can have more than 512 processors' worth of jobs running at once. This is intended to stop a single user monopolizing the cluster
* no user can have a processor-time product that exceeds 1 week x 128 cores running at any given time. This is intended to stop large long jobs blocking shorter jobs.
These policies apply to all queues but are tailored to the main queue where the competition is highest. If they cause you problems, please contact the [[administrators]]. We are happy to review policies to try to get the fairest result for everyone, and we can relax the default requirements if you have a particular need for more resources.
d2f6192c15c11346900054c989ed86f5a813ba76
Terms of use
0
77
538
2018-04-29T09:43:50Z
Mjh
2
Created page with "Use of the UHHPC facility is subject to terms and conditions. To apply for or continue to hold an account you must explicitly agree to these conditions. * Access to UHHPC is..."
wikitext
text/x-wiki
Use of the UHHPC facility is subject to terms and conditions. To apply for or continue to hold an account you must explicitly agree to these conditions.
* Access to UHHPC is available to three classes of people:
*# Members of the University of Hertfordshire (UH), including undergraduate students, who have a legitimate need to use the cluster facilities. Undergraduate students must apply through an academic supervisor.
*# External collaborators of UH research staff, for work on projects that will directly benefit UH.
*# Members of external consortia who have negotiated an agreement with UH (e.g. [[WEAVE]], [[LOFAR-UK Compute Facility|LOFAR-UK]]).
* Access is conditional on abiding by the [[policies]] that govern use of the cluster. Access may be withdrawn at the discretion of the [[administrators]] in extreme cases.
* Access is given to individuals. Account credentials may not be shared. An individual is solely responsible for the use made of their account.
* UH will store the full name and contact e-mail address of all users. A valid, monitored e-mail address must be on record for all users. Users must notify the [[administrators]] of any change in their contact details.
* UH takes no responsibility for backing up users' data. Users store data on UHHPC at their own risk.
* UH makes no guarantee about the level of service provided at any given time.
52fe230325556184cd2ccff98798ab30612f58fb
539
538
2018-04-29T09:46:37Z
Mjh
2
wikitext
text/x-wiki
Use of the UHHPC facility is subject to terms and conditions. To apply for or continue to hold an account you must explicitly agree to these conditions.
* Access to UHHPC is available to three classes of people:
*# Members of the University of Hertfordshire (UH), including undergraduate students, who have a legitimate need to use the cluster facilities. Undergraduate students must apply through an academic supervisor.
*# External collaborators of UH research staff, for work on projects that will directly benefit UH.
*# Members of external consortia who have negotiated an agreement with UH (e.g. [[WEAVE]], [[LOFAR-UK Compute Facility|LOFAR-UK]]).
* Access is conditional on abiding by the [[policies]] that govern use of the cluster. Access may be withdrawn at the discretion of the [[administrators]] in extreme cases.
* Access is given to individuals. Account credentials may not be shared. An individual is solely responsible for the use made of their account.
* UH will store the full name and contact e-mail address of all users. A valid, monitored e-mail address must be on record for all users. Users must notify the [[administrators]] of any change in their contact details.
* UH takes no responsibility for backing up users' data. Users store data on UHHPC at their own risk.
* UH makes no guarantee about the level of service provided at any given time.
* Access is provided only while it is needed. Users must notify the [[administrators]] when they no longer need access. Accounts will be deleted according to the [[Account cancellation policy]].
8313faef8e763fb4d82aa6bd7c58347243508f78
546
539
2018-05-02T13:49:02Z
Mjh
2
wikitext
text/x-wiki
Use of the UHHPC facility is subject to terms and conditions. To apply for or continue to hold an account you must explicitly agree to these conditions.
* Access to UHHPC is available to three classes of people:
*# Members of the University of Hertfordshire (UH), including undergraduate students, who have a legitimate need to use the cluster facilities. Undergraduate students must apply through an academic supervisor.
*# External collaborators of UH research staff, for work on projects that will directly benefit UH.
*# Members of external consortia who have negotiated an agreement with UH (e.g. [[WEAVE]], [[LOFAR-UK Compute Facility|LOFAR-UK]]).
* Access is conditional on abiding by the [[policies]] that govern use of the cluster. Access may be withdrawn at the discretion of the [[administrators]] in extreme cases.
* Access is given to individuals. Account credentials may not be shared. An individual is solely responsible for the use made of their account.
* UH will store the full name and contact e-mail address of all users. A valid, monitored e-mail address must be on record for all users. Users must notify the [[administrators]] of any change in their contact details. Your details will be stored securely on the cluster itself and will be used both for a cluster mailing list and to contact you in case of problems with your use of the cluster.
* UH takes no responsibility for backing up users' data. Users store data on UHHPC at their own risk.
* UH makes no guarantee about the level of service provided at any given time.
* Access is provided only while it is needed. Users must notify the [[administrators]] when they no longer need access. Accounts will be deleted according to the [[Account cancellation policy]].
a1a57b1e7ca2cba726265b6a8364071ff3da38e6
547
546
2018-05-06T21:40:11Z
Mjh
2
wikitext
text/x-wiki
Use of the UHHPC facility is subject to terms and conditions. To apply for or continue to hold an account you must explicitly agree to these conditions.
* Access to UHHPC is available to three classes of people:
*# Members of the University of Hertfordshire (UH), including undergraduate students, who have a legitimate need to use the cluster facilities. Undergraduate students must apply through an academic supervisor.
*# External collaborators of UH research staff, for work on projects that will directly benefit UH.
*# Members of external consortia who have negotiated an agreement with UH (e.g. [[WEAVE]], [[LOFAR-UK Compute Facility|LOFAR-UK]]).
* Access is conditional on abiding by the [[policies]] that govern use of the cluster. Access may be withdrawn at the discretion of the [[administrators]] in extreme cases.
* Access is given to individuals. Account credentials may not be shared. An individual is solely responsible for the use made of their account.
* UH will store the full name and contact e-mail address of all users. A valid, monitored e-mail address must be on record for all users. Users must notify the [[administrators]] of any change in their contact details. Your details will be stored securely on the cluster itself and will be used both for a cluster mailing list and to contact you in case of problems with your use of the cluster.
* The administrators may take whatever actions they feel necessary to ensure the continued operation and security of the facility, which may include inspecting any data or programs stored on the cluster.
* UH takes no responsibility for backing up users' data. Users store data on UHHPC at their own risk.
* UH makes no guarantee about the level of service provided at any given time.
* Access is provided only while it is needed. Users must notify the [[administrators]] when they no longer need access. Accounts will be deleted according to the [[Account cancellation policy]].
f19da811178691b698638cd51b0c9ea4c6875108
548
547
2018-05-10T10:19:58Z
Mjh
2
wikitext
text/x-wiki
Use of the UHHPC facility is subject to terms and conditions. To apply for or continue to hold an account you must explicitly agree to these conditions.
* Access to UHHPC is available to three classes of people:
*# Members of the University of Hertfordshire (UH), including undergraduate students, who have a legitimate need to use the cluster facilities. Undergraduate students must apply through an academic supervisor.
*# External collaborators of UH research staff, for work on projects that will directly benefit UH.
*# Members of external consortia who have negotiated an agreement with UH (e.g. [[WEAVE]], [[LOFAR-UK Compute Facility|LOFAR-UK]]).
* Access is conditional on abiding by the [[policies]] that govern use of the cluster. Access may be withdrawn at the discretion of the [[administrators]] in extreme cases.
* Access is given to individuals. Account credentials may not be shared. An individual is solely responsible for the use made of their account.
* UH will store the full name and contact e-mail address of all users. A valid, monitored e-mail address must be on record for all users. Users must notify the [[administrators]] of any change in their contact details. Your details will be stored securely on the cluster itself and will be used both for a cluster mailing list and to contact you in case of problems with your use of the cluster.
* The administrators may take whatever actions they feel necessary for troubleshooting or to ensure the smooth operation and security of the facility, which may include inspecting any data or programs stored on the cluster.
* UH takes no responsibility for backing up users' data. Users store data on UHHPC at their own risk.
* UH makes no guarantee about the level of service provided at any given time.
* Access is provided only while it is needed. Users must notify the [[administrators]] when they no longer need access. Accounts will be deleted according to the [[Account cancellation policy]].
7d7928b112fedf279dcc3803ac3ce838bfc1b80b
549
548
2018-05-10T10:34:24Z
Mjh
2
Protected "[[Terms of use]]" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))
wikitext
text/x-wiki
Use of the UHHPC facility is subject to terms and conditions. To apply for or continue to hold an account you must explicitly agree to these conditions.
* Access to UHHPC is available to three classes of people:
*# Members of the University of Hertfordshire (UH), including undergraduate students, who have a legitimate need to use the cluster facilities. Undergraduate students must apply through an academic supervisor.
*# External collaborators of UH research staff, for work on projects that will directly benefit UH.
*# Members of external consortia who have negotiated an agreement with UH (e.g. [[WEAVE]], [[LOFAR-UK Compute Facility|LOFAR-UK]]).
* Access is conditional on abiding by the [[policies]] that govern use of the cluster. Access may be withdrawn at the discretion of the [[administrators]] in extreme cases.
* Access is given to individuals. Account credentials may not be shared. An individual is solely responsible for the use made of their account.
* UH will store the full name and contact e-mail address of all users. A valid, monitored e-mail address must be on record for all users. Users must notify the [[administrators]] of any change in their contact details. Your details will be stored securely on the cluster itself and will be used both for a cluster mailing list and to contact you in case of problems with your use of the cluster.
* The administrators may take whatever actions they feel necessary for troubleshooting or to ensure the smooth operation and security of the facility, which may include inspecting any data or programs stored on the cluster.
* UH takes no responsibility for backing up users' data. Users store data on UHHPC at their own risk.
* UH makes no guarantee about the level of service provided at any given time.
* Access is provided only while it is needed. Users must notify the [[administrators]] when they no longer need access. Accounts will be deleted according to the [[Account cancellation policy]].
7d7928b112fedf279dcc3803ac3ce838bfc1b80b
Quota
0
38
540
256
2018-04-30T11:25:38Z
Mjh
2
wikitext
text/x-wiki
Use of space on <tt>/home</tt> is subject to a quota system. You have a fixed amount of space that you are allowed to use: if you exceed this, you will no longer be able to create files on <tt>/home</tt>.
The current default quota for all users is 50 Gb. When you reach 49 Gb, you will be warned and given a period (1 week) in which your usage should be reduced below 49 Gb; if you fail to reduce usage in this period, or if your usage reaches 50 Gb, new file creation will be blocked.
The quota is ''not'' an expected reasonable use for a cluster user. You should try to keep your use of <tt>/home</tt> as low as possible.
If you believe you have a specific need to exceed this quota, please contact the cluster [[administrators]].
There is no quota on the various data areas (see [[Storage]]) and these are the locations where it is appropriate to store large volumes of data.
0aed4b60f2544eed36502db8c4a3a44ccfc79b39
Jobs
0
9
543
507
2018-04-30T22:00:56Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://docs.adaptivecomputing.com/torque/help.htm Torque] (formerly known as PBS) and [http://docs.adaptivecomputing.com/maui/index.php Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/qsub.htm online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
* <tt>file</tt>: the [[local disk space]] to be used by your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=64 job.sh ## run for two hours on 64 CPUs (which may be spread over up to 64 physical nodes)
qsub -l walltime=1:0:0 -l nodes=2:ppn=32 job.sh ## run for one hour on 2 nodes with 32 CPUs per node
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes on one node for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. Do not attempt to use these options unless you know what you are doing.
Note also that in the present configuration requests for fewer processors per node than are available may be consolidated onto fewer nodes. This should never be a problem unless you are specifically testing inter-node communication, in which case you could specify node names to force running on different nodes. For efficiency, you are recommended always to ask for and use the maximum number of processes per node for MPI or multi-threaded code -- that is, 32 processes for the main cluster. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
The MAUI tool <tt>showq</tt> can also be used to get a view of the whole queue, which is sometimes more user-friendly:
<pre>
/usr/local/maui/bin/showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1765 mjh Running 128 1:15:20 Fri May 7 13:38:20
1766 mjh Running 128 1:15:20 Fri May 7 13:38:20
1767 mjh Running 128 1:15:20 Fri May 7 13:38:20
1768 mjh Running 128 1:15:20 Fri May 7 13:38:20
1769 mjh Running 128 1:15:20 Fri May 7 13:38:20
5 Active Jobs 640 of 640 Processors Active (100.00%)
80 of 80 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1770 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1771 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1772 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1773 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1774 mjh Idle 128 5:00:00 Fri May 7 13:38:20
1775 mjh Idle 128 5:00:00 Fri May 7 13:38:20
6 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 11 Active Jobs: 5 Idle Jobs: 6 Blocked Jobs: 0
</pre>
Finally, you can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qalter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/2-jobs/exportedBatchEnvVar.htm here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/pbsdsh.htm pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
aad3bca4dfba71e2377a3c1830f0053e04e11cf7
Architecture
0
7
544
518
2018-04-30T22:05:10Z
Mjh
2
/* compute nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, stri-cluster, with 2 4-core Xeons and 32 GB RAM: old login node, now deprecated
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis5-9 are populated. The newer compute nodes, which will be used by most people, are in two racks, rack1 and rack2. The smp machines are outside any particular rack or chassis.
* 64 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-064: rack1, rack2), in the main queue
* 16 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 12 GB RAM and QDR Infiniband form part of the CAIR cluster (node065-80: chassis 5), in the cair_l queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAIR cluster (node081-92: chassis 6), in the cair_l and cair_s queues
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-9, file servers providing the BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
72aeb4df24dd68d7e0fc40e646539cc8f044ce78
563
544
2018-12-24T08:11:40Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, stri-cluster, with 2 4-core Xeons and 32 GB RAM: web and job servers
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis5-9 are populated. The newer compute nodes, which will be used by most people, are in two racks, rack1 and rack2. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-9, file servers providing the BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
1ba9683c3561bbd028aeb34f7322e8dff15f5036
564
563
2018-12-24T08:12:02Z
Mjh
2
/* Servers and dedicated login nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, stri-cluster, with 2 4-core Xeons and 32 GB RAM: web and job servers
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis5-9 are populated. The newer compute nodes, which will be used by most people, are in two racks, rack1 and rack2. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-10, file servers providing the BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
f57da34d4263958e3f345bc4e7a50fee7fcfc347
Networking
0
10
545
463
2018-04-30T22:12:21Z
Mjh
2
wikitext
text/x-wiki
The nodes are linked by Gigabit ethernet and Infiniband networks.
The ethernet network is a fairly conventional one; each chassis (see [[architecture]]) has an internal ethernet switch to which all nodes in that chassis are connected. These switches, together with the head node, are connected together via a main ethernet switch. The links connecting the head node and the SMP machines, and the uplinks from the two node internal ethernet stacks, are carried by multiple physical cables via link aggregation. A seperate physical ethernet network carries management traffic.
The infiniband network is slightly more complex. Each chassis or rack (see [[Architecture]] has an internal infiniband switch and these are all linked via two main infiniband switches, one FDR14, one QDR. The main cluster nodes in rack1 and rack2 use FDR14 infiniband (56 Gb/s), and chassis9 uses FDR10 (40 GB/s); all other machines on the network have QDR infiniband cards (40 GB/s).
The networking arrangements should be taken into consideration when running jobs where fast communication is important -- latency is lower and data transfer rates are somewhat higher between nodes in the same chassis or rack than between different chassis, and ethernet connections are higher-latency and lower-bandwidth still. Best results will be obtained for IPC if jobs run in the same chassis or rack. The scheduler is aware of this and will try to ensure that jobs do not span more than one rack or chassis.
None of the nodes are on the public Internet: they all have private IP addresses starting 192.168.... These addresses are not routable from anywhere else in the University. They can, however, all see the public Internet via IP masquerading through the head node through the ethernet network. (For this reason, it is not sensible to try to make high-volume accesses to the Internet from the nodes.)
Each node has an IP address corresponding to its ethernet port (192.168.2.xxx where xxx is the node number) and one corresponding to the Infiniband port (192.168.3.xxx). Within the cluster, you may use nodexxx.data (e.g. node001.data) and nodexxx.infi (node001.infi) to refer to these two networks. High-volume traffic should use the Infiniband network.
The SMP machines have addresses smp1.data, smp1.infi etc.
f6856ca713a195c7c9a950450f4b222a3a0b8bc1
Administrators
0
6
550
483
2018-05-10T10:37:43Z
Mjh
2
wikitext
text/x-wiki
== Administrators ==
These are currently:
* Vito Graffagnino, v.graffagnino@herts.ac.uk (x3358, room 1E71 Innovation Centre).
* Martin Hardcastle, m.j.hardcastle@herts.ac.uk (x3409, room 2E71 Innovation Centre).
UH staff and students should see Vito (or failing that Martin) to get an account. External consortium users should contact Martin in the first instance.
654fae6d371ca7ebef2a3defb258c59ddab39027
Accounts
0
3
551
479
2018-05-10T10:48:36Z
Mjh
2
wikitext
text/x-wiki
To get an account, contact the [[administrators]].
Accounts are available to all staff and research students of UH, and to others by special arrangement.
Access is granted subject to the [[Terms of use]] of the cluster and to observance of our usage [[policies]]. Accounts may be suspended or cancelled if the cluster is abused: see also our [[Account cancellation policy]].
059676120394ae0ef21938d682920f4cd8fd0a2e
Main Page
0
1
552
510
2018-05-10T10:50:22Z
Mjh
2
/* Cluster basics */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the UH HPC service.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to achieve a particular result on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Getting started ==
* [[Read this first]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Terms of use]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[GPUs]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[Monitoring]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
== Known problems ==
* [[Known problems]]
889281d9ac3c941ae1ed684a0ddd013b2f23d8a9
Why doesn't my job run?
0
37
555
353
2018-05-11T16:29:38Z
Mjh
2
wikitext
text/x-wiki
If you submit a [[jobs|job]] and it stays in the queue, it may not be clear why nothing is happening. Obviously one possibility is that there is a problem with the cluster: but it is also possible that things are going on that you are not seeing.
To ask the scheduler what is going on with a job you can use the command <tt>/usr/local/maui/bin/checkjob</tt>. A typical <tt>checkjob</tt> run might look like this (note the <tt>-v</tt> option):
<pre>
/usr/local/maui/bin/checkjob -v 123456
checking job 123456 (RM job '123456.stri-cluster.herts.ac.uk')
State: Idle
Creds: user:fred group:fred class:main qos:DEFAULT
WallTime: 00:00:00 of 7:00:00:00
SubmitTime: Fri Jul 8 09:04:48
(Time Queued Total: 00:38:52 Eligible: 00:38:52)
Total Tasks: 24
Req[0] TaskCount: 24 Partition: ALL
Network: [NONE] Memory >= 1024M Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [main]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1 MEM: 1024M
NodeAccess: SHARED
TasksPerNode: 8 NodeCount: 3
IWD: [NONE] Executable: [NONE]
Bypass: 63 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
PE: 24.00 StartPriority: 2513
job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 24 procs found)
idle procs: 732 feasible procs: 0
Rejection Reasons: [Features : 58][CPU : 22][State : 18][ReserveTime : 8]
Detailed Node Availability Information:
node001 rejected : ReserveTime
node002 rejected : ReserveTime
node003 rejected : ReserveTime
node004 rejected : State
node005 rejected : ReserveTime
node006 rejected : ReserveTime
node007 rejected : ReserveTime
node008 rejected : ReserveTime
node009 rejected : ReserveTime
node010 rejected : CPU
node011 rejected : CPU
node012 rejected : CPU
node013 rejected : State
node014 rejected : CPU
node015 rejected : CPU
node016 rejected : CPU
node017 rejected : State
node018 rejected : State
node019 rejected : State
node020 rejected : State
node021 rejected : State
node022 rejected : State
node023 rejected : State
node024 rejected : State
node025 rejected : State
node026 rejected : State
node027 rejected : State
node028 rejected : State
node029 rejected : State
node030 rejected : State
node031 rejected : State
node032 rejected : CPU
node033 rejected : CPU
node034 rejected : CPU
node035 rejected : CPU
node036 rejected : CPU
node037 rejected : CPU
node038 rejected : CPU
node039 rejected : CPU
node040 rejected : CPU
node041 rejected : State
node042 rejected : CPU
node043 rejected : CPU
node044 rejected : CPU
node045 rejected : CPU
node046 rejected : CPU
node047 rejected : CPU
node048 rejected : CPU
node049 rejected : Features
node050 rejected : Features
node051 rejected : Features
node052 rejected : Features
node053 rejected : Features
node054 rejected : Features
node055 rejected : Features
node056 rejected : Features
node057 rejected : Features
node058 rejected : Features
node059 rejected : Features
node060 rejected : Features
node061 rejected : Features
node062 rejected : Features
node063 rejected : Features
node064 rejected : Features
node065 rejected : Features
node066 rejected : Features
node067 rejected : Features
node068 rejected : Features
node069 rejected : Features
node070 rejected : Features
node071 rejected : Features
node072 rejected : Features
node073 rejected : Features
node074 rejected : Features
node075 rejected : Features
node076 rejected : Features
node077 rejected : Features
node078 rejected : Features
node079 rejected : Features
node080 rejected : Features
sandbox1 rejected : Features
sandbox2 rejected : Features
sandbox3 rejected : Features
sandbox4 rejected : Features
sandbox5 rejected : Features
sandbox6 rejected : Features
sandbox7 rejected : Features
sandbox8 rejected : Features
sandbox9 rejected : Features
sandbox10 rejected : Features
node081 rejected : Features
node082 rejected : Features
node083 rejected : Features
node084 rejected : Features
node085 rejected : Features
node086 rejected : Features
node087 rejected : Features
node088 rejected : Features
node089 rejected : Features
node090 rejected : Features
node091 rejected : Features
node092 rejected : Features
node093 rejected : Features
node094 rejected : Features
node095 rejected : Features
node096 rejected : Features
job cannot run in partition SMP (insufficient idle procs available: 0 < 24)
</pre>
How do you interpret all this output?
First of all, if your output does not look anything like this, for example if you see an error message about not being able to contact a server, then the scheduler is not running or there is some other problem: see [[Known problems]]. If you need help in this situation, contact one of the [[administrators]].
Assuming your output looks like the above, first of all you should check that the details of your job agree with what you think you submitted. Check <tt>NodeCount</tt> and <tt>TasksPerNode</tt> and the <tt>WallTime</tt> request. You may also want to check the output of <tt>qstat -f <jobid></tt>.
Now go down to the reason why <tt>job cannot run in partition DEFAULT</tt>. Normally, this will be as above: this is telling you that there are not enough available CPUs for your job. But suppose you think the cluster is close to empty. Why is this?
Look down the list of individual nodes to see why each one is rejecting your job. The example above shows the common reasons:
* Features: this means that you have asked for a feature that the node in question doesn't have. For example, jobs submitted to the main cluster will never run on the [[sandbox]] machines. It is normal for many nodes to reject your job for this reason.
* State: usually this means that the node in question is marked busy by the system, which means that it is running as many jobs as possible. In this example, there are a number of busy nodes. It can also mean that nodes are down, i.e. that there is a real problem. If many nodes reject your job because of 'State' but you think they cannot be busy, check <tt>pbsnodes -l</tt> to see if they are 'down' and report a problem if so. Nodes that are 'offline' in <tt>pbsnodes -l</tt> have been taken offline by the administrators for maintenance and there is no need to report them unless you think this is an error.
* CPU: the node is not busy, but it has less available CPU than you have requested. This is most likely because it is running one or more jobs. In the example above, the user has asked for 3 nodes with 8 CPUs each: the nodes rejecting the job because of CPU do not have 8 free CPUs. (If all nodes are rejecting your job for this reason, you should check whether you are asking for an impossible configuration. Jobs submitted to the main cluster asking for more than 32 cores per node will sit there forever waiting for a machine with a suitable number of cores to become available. Does your job really need 3 machines with 8 CPUs, or would it work with 24 CPUs distributed over the cluster?
* ReserveTime: your job asks to run on the node at a time when it is reserved for another purpose. This could be due to another job ahead of you in the queue, or it could be a system reservation. If you see this message but there are no jobs ahead of you in the queue, check your e-mail for a recent e-mail from the adminstrators regarding scheduled downtime.
If you are seeing all of these and you can convince yourself that the individual nodes are rejecting your jobs for valid reasons, then the only thing you can do is wait for the conditions that are blocking your job to clear. Only if you can't convince yourself of this should you contact the administrators.
0f6f69e1c7c68e01bfdff4b85fbb99a4e1745262
Cluster bibliography
0
30
557
476
2018-07-25T09:24:53Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''), Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011''') Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A, Consensus virtual screening approaches to predict protein ligands ('''2011'''), ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
0b8fb4ce3803ca9c11be84c233c4c1cb9518bcf7
559
557
2018-07-27T05:26:28Z
Ptaylor
16
Updated with P Taylor's papers.
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Taylor, P., Federrath, C., Kobayashi, C. The origin of kinematically distinct cores and misaligned gas discs in galaxies from cosmological simulations. '''2018'''. MNRAS, 479, 141
* Taylor, P., Kobayashi, C. The metallicity and elemental abundance gradients of simulated galaxies and their environmental dependence. '''2017'''. MNRAS, 471, 3856
* Taylor, P., Federrath, C., Kobayashi, C. Star formation in simulated galaxies: understanding the transition to quiescence at 3x10^10 solar masses. '''2017'''. MNRAS, 469, 4249
*Taylor, P., Kobayashi, C. Time evolution of galaxy scaling relations in cosmological simulations. '''2016'''. MNRAS, 463, 2465
*Taylor, P., Kobayashi, C. Quantifying AGN-driven metal-enhanced outflows in chemodynamical simulations. '''2015'''. MNRAS, 452L, 59
*Taylor, P., Kobayashi, C. The effects of AGN feedback on present-day galaxy properties in cosmological simulations. '''2015'''. MNRAS, 448, 1835
*Taylor, P., Kobayashi, C. Seeding black holes in cosmological simulations. '''2014'''. MNRAS, 442, 2751
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''), Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011''') Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A, Consensus virtual screening approaches to predict protein ligands ('''2011'''), ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
1255bdc910ca743a430557203b098e6af177cdb4
Gromacs
0
19
558
477
2018-07-25T09:50:04Z
Akukol
3
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local LINUX machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use Gromacs version 2018.
In order to run Gromacs on the headnode for preparation, it is a good idea to put the following into you .cshrc file:
source /soft/gromacs-2018/bin/GMXRC (or /soft/gromacs-2018-gpu/bin/GMXRC for GPU preparation)
export LD_LIBRARY_PATH="soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="/soft/gcc-6.4/lib64:${LD_LIBRARY_PATH}"
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Look here for [[groperform|optimising performance]]. There are two version of Gromacs 2018.2 for GPU and non-GPU located in /soft/gromacs-2018 and /soft/gromacs-2018-gpu
Note that all GPUs attached to the node are used automatically. The maximum walltime is 48 hours.
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
''Andreas/Hershna''
'''For GPU:'''
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q gpu
#PBS -l nodes=1:ppn=16
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -k oe
#PBS -u hpatel
# runs a job with name 'GromacsTest' on the gpu machine on the cluster
# uses 1 GPU machine
# set a maximum time of forty eight hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'hpatel'
# set required paths:
source /soft/gromacs-2018-gpu/bin/GMXRC
# specify working directory:
cd /home/hpatel/gromacsGPU
export LD_LIBRARY_PATH="soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="/soft/gcc-6.4/lib64:${LD_LIBRARY_PATH}"
### This is the command ###
gmx mdrun -s run.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
--------------
For non-GPU use, Gromacs is optimised for the newer nodes that contain 32 cores. In order to make sure that the job runs on these nodes, you have to request them with #PBS -l nodes=1:ppn=32. An example of a job script is shown below:
'''Without use of GPU:'''
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=32
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -k oe
#PBS -u hpatel
# runs a job with name 'GromacsTest' on the main cluster
# set a maximum time of forty eight hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'hpatel'
# set required paths:
source /soft/gromacs-2018/bin/GMXRC
# specify working directory:
cd /home/hpatel/gromacs
export LD_LIBRARY_PATH="soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="/soft/gcc-6.4/lib64:${LD_LIBRARY_PATH}"
### This is the command ###
gmx mdrun -s run.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
32806858290e126aa813ebee02c3892482fa8354
LOFAR
0
47
560
451
2018-09-30T08:43:44Z
Mjh
2
wikitext
text/x-wiki
You will need a <tt>.casarc</tt> file, something like this should do:
<pre>
measures.DE200.directory: /soft/casapy/data/ephemerides
measures.DE405.directory: /soft/casapy/data/ephemerides
measures.line.directory: /soft/casapy/data/ephemerides
measures.sources.directory: /soft/casapy/data/ephemerides
measures.comet.directory: /soft/casapy/data/ephemerides
measures.ierseop97.directory: /soft/casapy/data/geodetic
measures.ierspredict.directory: /soft/casapy/data/geodetic
measures.tai_utc.directory: /soft/casapy/data/geodetic
measures.igrf.directory: /soft/casapy/data/geodetic
measures.observatory.directory: /soft/casapy/data/geodetic
</pre>
Then a full setup for up-to-date versions of the LOFAR software looks something like this:
<pre>
bash
source /soft/lofar-270618/init.sh
</pre>
The LOFAR software is frequently updated. For the most up-to-date versions look for /soft/lofar-date where date is a numeric build date. Then
source /soft/lofar-date/lofarinit.csh instead.
9cd5009eb1d9202410f651e2099df2b145653d30
Software
0
17
561
471
2018-10-05T07:19:13Z
Mjh
2
wikitext
text/x-wiki
This page documents locations of software.
Detailed local documentation of software should go in a page specific to that software.
* <u>[[Gromacs]]</u>: 2016 (with GPU acceleration) installed in <tt>/soft/gromacs-2016-gpu</tt>
* Autodock <u>[[Vina]]</u>: 1.1.1 installed in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* <u>[[Autodock]]</u>: 4.2 installed in <tt>/soft/autodock</tt>
* <u>[[iGemDock]]</u>: 2.1 installed in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* <u>[[AIPS]]</u>: 31DEC10 installed in <tt>/soft/aips</tt>
* <u>[[CASA]]</u>: installed in <tt>/soft/casapy...</tt>
* <u>[[IDL]]</u>: in <tt>/soft/idl/idl/bin</tt>
* <u>[[Matlab]]</u>: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* <u>[[Starlink]]</u>: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* <u>[[LOFAR]]</u>: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* <u>[[Python packages]]</u>: set your PYTHONPATH to include <tt>/soft/python/lib64/python2.7/site-packages</tt>
* <u>[[neuron]]</u>: in <tt> /soft/nrn</tt>
* <u>[[Miriad]]</u>: in <tt> /soft/miriad</tt>
* <u>[[ciao]]</u>: in <tt>/soft/ciao-x.x</tt>
cf7c44178b56e87b7d5be6e46d16be4b16ee72c6
To do
0
78
562
2018-12-24T08:08:52Z
Mjh
2
Created page with "= To do, downtime early 2019 = * Node software update * Firmware update node001-080 * Replace old stri-cluster * Infiniband tidy * Remove remaining 2TB dot hill kit * beegfs..."
wikitext
text/x-wiki
= To do, downtime early 2019 =
* Node software update
* Firmware update node001-080
* Replace old stri-cluster
* Infiniband tidy
* Remove remaining 2TB dot hill kit
* beegfs servers to new location
* beegfs upgrade to v7
* Tidy stack 3
* node070 hardware issue
* labelling
* torque upgrade?
6b1bbba01b726cffc1a9d21890de36cb4b82a6de
566
562
2019-01-01T19:11:02Z
Mjh
2
/* To do, downtime early 2019 */
wikitext
text/x-wiki
= To do, downtime early 2019 =
* Node software update
* Firmware update node001-080
* Replace old stri-cluster
* Infiniband tidy
* Remove remaining 2TB dot hill kit
* beegfs servers to new location
* beegfs upgrade to v7
* Tidy stack 3
* node070 hardware issue
* labelling
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
1dc946cf09bc363efcce012a6e3923307253ee88
568
566
2019-01-28T11:28:40Z
Mjh
2
/* To do, downtime early 2019 */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* Infiniband tidy
* Remove remaining 2TB dot hill kit
* beegfs servers to new location
* Tidy stack 3
* node070 hardware issue
* labelling
== Software ==
* Node software update
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
38747b08431438a6a307dd4fa1f9d2839c7696db
569
568
2019-01-28T11:43:39Z
Mjh
2
/* To do, downtime early 2019 */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* Infiniband tidy
* beegfs servers to new location
* Tidy stack 3
* node070 hardware issue
* labelling
== Software ==
* Node software update
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
38d3d857856e471c0551db66a62fc525fa51e0f6
570
569
2019-01-28T12:05:14Z
Mjh
2
/* Hardware */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* Infiniband tidy
* beegfs servers to new location
* Tidy stack 3
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
6d5be3d9026943324a5b2e03840f42d290f63786
571
570
2019-01-28T12:05:32Z
Mjh
2
/* Software */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* Infiniband tidy
* beegfs servers to new location
* Tidy stack 3
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 upgrade
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
20eccc61059f45927a4b44be64802c8ef9860135
572
571
2019-01-28T15:10:30Z
Mjh
2
/* To do, downtime early 2019 */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* Tidy stack 3
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
cd3bc17bbd979be56754d134e03d0319a1f08168
573
572
2019-01-28T16:27:41Z
Mjh
2
/* To do, downtime early 2019 */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* Tidy stack 3
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* log in to node
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
f7840dc18c3e2601bae89182effec7199a364f09
Python packages
0
49
567
347
2019-01-11T14:55:11Z
Mjh
2
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* astropy [http://pypi.python.org/packages/source/a/astropy/astropy-0.2.4.tar.gz]
* kapteyn [http://www.astro.rug.nl/software/kapteyn/]
* h5py
* mpi4py
* hcluster
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these available.
Please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>.
7b8be0dfba5909f5783268f54aa6605e0f463e5e
To do
0
78
574
573
2019-01-28T16:39:44Z
Mjh
2
/* Notes */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* Tidy stack 3
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* log in to nodexxx.management
* select 'launch virtual console'
* if popups blocked, allow and launch again
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
7d13b064f1941ff949062f76acba5e2cb22a56e4
575
574
2019-01-28T16:44:26Z
Mjh
2
/* Hardware */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* log in to nodexxx.management
* select 'launch virtual console'
* if popups blocked, allow and launch again
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
f79c4e761c5455fa851c5de4765efacd12ef81cd
576
575
2019-01-29T08:33:07Z
Mjh
2
/* Notes */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
964b840d57d4e836116a2972bc21847b8e955850
577
576
2019-01-29T17:41:20Z
Mjh
2
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* DNS
* NTP server/broadcast
* Torque
* Maui
* Web server
* Mariadb
* haproxy ssh
a1b6b415617e7fa4f1283dd703bf0c558ac1e613
578
577
2019-01-29T17:42:32Z
Mjh
2
/* Services that need to work on new head node */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* DNS
* NTP server/broadcast
* Torque
* Maui
* Web server
* Mariadb
* haproxy ssh
* Munin
* Ganglia
1d389a46e0d84337ff630ec3a6f93560fbbcd4c0
579
578
2019-01-29T17:47:11Z
Mjh
2
/* Services that need to work on new head node */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* <s>DNS</s>
* <s>NTP server</s>/broadcast
* Torque
* Maui
* Web server
* Mariadb
* haproxy ssh
* Munin
* Ganglia
38e387e38c802992fdb1abe0cc6010d47bec901d
580
579
2019-01-30T17:41:48Z
Mjh
2
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* <s>DNS</s>
* <s>NTP server</s>/broadcast
* <s>Torque</s>
* <s>Maui</s>
* <s>Web server</s>
* <s>Mariadb</s>
* haproxy ssh
* Munin
* Ganglia
* DHCP
92789397138b3f40e033336ae1b6a7f473553dd3
585
580
2019-01-31T14:22:28Z
Mjh
2
/* Services that need to work on new head node */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* Node software update
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* torque upgrade?
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* <s>DNS</s>
* <s>NTP server</s>/broadcast
* <s>Torque</s>
* <s>Maui</s>
* <s>Web server</s>
* <s>Mariadb</s>
* <s>haproxy ssh</s>
* Munin
* Ganglia
* <s>DHCP</s>
* <s>NFS</s>
d343b64da77e23f72735bbd3786fd045b0f066d4
586
585
2019-01-31T14:22:59Z
Mjh
2
/* Software */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* <s>Node software update</s>
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* <s>torque upgrade</s>
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* <s>DNS</s>
* <s>NTP server</s>/broadcast
* <s>Torque</s>
* <s>Maui</s>
* <s>Web server</s>
* <s>Mariadb</s>
* <s>haproxy ssh</s>
* Munin
* Ganglia
* <s>DHCP</s>
* <s>NFS</s>
a214b167b46fc51389d7053671d56508effe7a69
587
586
2019-01-31T14:23:28Z
Mjh
2
/* Services that need to work on new head node */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* Replace old stri-cluster
* beegfs servers to new location
* node070 hardware issue -- DIMM B6
* labelling
== Software ==
* <s>Node software update</s>
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* <s>torque upgrade</s>
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* <s>DNS</s>
* <s>NTP server/broadcast</s>
* <s>Torque</s>
* <s>Maui</s>
* <s>Web server</s>
* <s>Mariadb</s>
* <s>haproxy ssh</s>
* <s>LDAP</s>
* Munin
* Ganglia
* <s>DHCP</s>
* <s>NFS</s>
7e67509c07cc2e5a93fe1c40253c700dbda8a8cb
588
587
2019-01-31T18:52:00Z
Mjh
2
/* Hardware */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* <s>Replace old stri-cluster</s>
* beegfs servers to new location
* <s>node070 hardware issue -- DIMM B6</s>
* labelling
== Software ==
* <s>Node software update</s>
* node024 reinstall
* Firmware update node001-080
* beegfs upgrade to v7
* <s>torque upgrade</s>
* all new nodes to LOFAR-capable
* sort out fstabs on all nodes
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* <s>DNS</s>
* <s>NTP server/broadcast</s>
* <s>Torque</s>
* <s>Maui</s>
* <s>Web server</s>
* <s>Mariadb</s>
* <s>haproxy ssh</s>
* <s>LDAP</s>
* Munin
* Ganglia
* <s>DHCP</s>
* <s>NFS</s>
35ec31c784f9f611344d443b4f551734dc58ac19
589
588
2019-01-31T18:52:39Z
Mjh
2
/* Software */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* <s>Replace old stri-cluster</s>
* beegfs servers to new location
* <s>node070 hardware issue -- DIMM B6</s>
* labelling
== Software ==
* <s>Node software update</s>
* node024 reinstall
* <s>Firmware update node001-080</s>
* <s>beegfs upgrade to v7</s>
* <s>torque upgrade</s>
* <s>all new nodes to LOFAR-capable</s>
* <s>sort out fstabs on all nodes</s>
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* <s>DNS</s>
* <s>NTP server/broadcast</s>
* <s>Torque</s>
* <s>Maui</s>
* <s>Web server</s>
* <s>Mariadb</s>
* <s>haproxy ssh</s>
* <s>LDAP</s>
* Munin
* Ganglia
* <s>DHCP</s>
* <s>NFS</s>
be79fb95241bdb8a5ca4b6fd223d2056d66df5fd
590
589
2019-02-01T13:07:47Z
Mjh
2
/* To do, downtime early 2019 */
wikitext
text/x-wiki
= To do, downtime early 2019 =
== Hardware ==
* <s>Replace old stri-cluster</s>
* beegfs servers to new location
* <s>node070 hardware issue -- DIMM B6</s>
* labelling
== Software ==
* <s>Node software update</s>
* <s>node024 fix</s> (but not updated firmware)
* <s>Firmware update node001-080</s>
* <s>beegfs upgrade to v7</s>
* <s>torque upgrade</s>
* <s>all new nodes to LOFAR-capable</s>
* <s>sort out fstabs on all nodes</s>
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== Services that need to work on new head node ==
* <s>DNS</s>
* <s>NTP server/broadcast</s>
* <s>Torque</s>
* <s>Maui</s>
* <s>Web server</s>
* <s>Mariadb</s>
* <s>haproxy ssh</s>
* <s>LDAP</s>
* Munin
* Ganglia
* Licence servers
* <s>DHCP</s>
* <s>NFS</s>
df02d4b5692a1d1fccdd6e7020ce1514930dbaf6
Queues
0
15
581
516
2019-01-30T21:20:48Z
Mjh
2
wikitext
text/x-wiki
There are six possible job queues available for general use on the system:
* 'main' is the default queue: this submits to the main cluster. The maximum wall time on this queue is 1 week.
* 'gpu' submits to the GPU nodes. The maximum wall time on this queue is 48 hours.
* 'smp' submits to the [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'cair_l' submits to the dedicated CAIR nodes. This queue is restricted to CAIR users.
* 'car' submits to the dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
* 'forecast' submits to the dedicated air quality forecast nodes.
== Default wall times ==
The default wall time for all the queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on all queues is 1.
2dad18acdc355d1ba066bc821fe371789f311324
Architecture
0
7
582
564
2019-01-31T09:51:43Z
Mjh
2
/* head/login nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis5-9 are populated. The newer compute nodes, which will be used by most people, are in two racks, rack1 and rack2. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-10, file servers providing the BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
9d8f84c2156de54cefd5212f9faf9754cbc61e91
583
582
2019-01-31T09:52:30Z
Mjh
2
/* compute nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis5-9 are populated. The newer compute nodes, which will be used by most people, are in three racks, rack1-3. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2, rack3), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-10, file servers providing the BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
ff64ce9e34d00cd76c7ed2a4511a0dcba76189c0
584
583
2019-01-31T09:53:33Z
Mjh
2
/* Servers and dedicated login nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis5-9 are populated. The newer compute nodes, which will be used by most people, are in three racks, rack1-3. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2, rack3), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-10, file servers providing 830 TB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
99675ce8e597e0a24d455581609840c88671f0e3
Singularity
0
79
591
2019-03-02T17:51:53Z
Mjh
2
Created page with "Singularity ([https://github.com/sylabs/singularity]) is installed on the cluster. This is our preferred way of running containerized applications, including Docker containers..."
wikitext
text/x-wiki
Singularity ([https://github.com/sylabs/singularity]) is installed on the cluster. This is our preferred way of running containerized applications, including Docker containers.
You need /soft/bin on your path to run singularity. You probably want to use the --bind option to bind data directories such as beegfs.
db2b92247647fd13c3bd0f3189887335ba1cda4e
603
591
2019-10-10T09:27:03Z
Mjh
2
wikitext
text/x-wiki
Singularity ([https://github.com/sylabs/singularity]) is installed on the cluster. This is our preferred way of running containerized applications, including Docker containers.
You need /soft/bin on your path to run singularity. You probably want to use the --bind option to bind data directories such as beegfs.
Note that singularity images can't be built on BeeGFS (they can be stored there once built). This will affect users converting from Docker images. If this causes you problems please contact the [[administrators]].
4da1cb60267910271ff7e00acbbc9718383fc536
Local disk space
0
48
592
341
2019-03-06T10:53:31Z
Mjh
2
wikitext
text/x-wiki
The main compute nodes have a limited amount of local disk space (around 700 Gb for nodes001-080 and 110 Gb for nodes081-144). This area is mounted on /local and is only visible internally to the node.
The normal expectation is that you will use the central [[storage]] for file operations, including data processing. However, there may be circumstances where it will be useful to copy some data to the nodes, do some I/O intensive operations on it and copy it back to the storage. In this case you may use the /local area.
If you want to do this, to avoid interfering with other jobs:
* You ''must'' reserve the maximum amount of space that your job will use using the <tt>file</tt> option to <tt>qsub</tt>; e.g.
<pre>qsub -l nodes=1,file=10gb</pre>
* You must create a directory in /local in which your job will work as part of your <tt>qsub</tt> script, which will be unique to your job. For example, you might want to do
<pre>
mkdir /local/$PBS_JOBID
cd /local/$PBS_JOBID
</pre>
* You must only work in this directory, and the total filespace you use must not exceed the reserved amount.
* When your job is finished it must before exiting clear up the filespace used; no files must be left in <tt>/local</tt>.
Note that these rules do not apply to the <tt>/scratch</tt> directories on the [[SMP machines]].
dfbbbd4421cf03e278123f18b097fbd2c2c9920a
Cluster bibliography
0
30
593
559
2019-04-12T09:40:31Z
Ptaylor
16
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Taylor, P., Kobayashi, C., Federrath, C. The metallicity and elemental abundance maps of kinematically atypical galaxies for constraining minor merger and accretion histories. '''2019'''. MNRAS, 485, 3215
* Wang, E.X., Taylor, P., Federrath, C., Kobayashi, C. The impact of black hole seeding in cosmological simulations. '''2019'''. MNRAS, 483, 4640
* Taylor, P., Federrath, C., Kobayashi, C. The origin of kinematically distinct cores and misaligned gas discs in galaxies from cosmological simulations. '''2018'''. MNRAS, 479, 141
* Taylor, P., Kobayashi, C. The metallicity and elemental abundance gradients of simulated galaxies and their environmental dependence. '''2017'''. MNRAS, 471, 3856
* Taylor, P., Federrath, C., Kobayashi, C. Star formation in simulated galaxies: understanding the transition to quiescence at 3x10^10 solar masses. '''2017'''. MNRAS, 469, 4249
*Taylor, P., Kobayashi, C. Time evolution of galaxy scaling relations in cosmological simulations. '''2016'''. MNRAS, 463, 2465
*Taylor, P., Kobayashi, C. Quantifying AGN-driven metal-enhanced outflows in chemodynamical simulations. '''2015'''. MNRAS, 452L, 59
*Taylor, P., Kobayashi, C. The effects of AGN feedback on present-day galaxy properties in cosmological simulations. '''2015'''. MNRAS, 448, 1835
*Taylor, P., Kobayashi, C. Seeding black holes in cosmological simulations. '''2014'''. MNRAS, 442, 2751
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''), Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011''') Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A, Consensus virtual screening approaches to predict protein ligands ('''2011'''), ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
d59357ee41959977210e4fd0224679e1d3ad56a3
604
593
2019-10-18T08:27:51Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Patel, H., & Kukol, A. (2019). Prediction of ligands to universally conserved binding sites of the influenza A virus nuclear export protein. Virology, 537, 97-103.
* Taylor, P., Kobayashi, C., Federrath, C. The metallicity and elemental abundance maps of kinematically atypical galaxies for constraining minor merger and accretion histories. '''2019'''. MNRAS, 485, 3215
* Wang, E.X., Taylor, P., Federrath, C., Kobayashi, C. The impact of black hole seeding in cosmological simulations. '''2019'''. MNRAS, 483, 4640
* Taylor, P., Federrath, C., Kobayashi, C. The origin of kinematically distinct cores and misaligned gas discs in galaxies from cosmological simulations. '''2018'''. MNRAS, 479, 141
* Taylor, P., Kobayashi, C. The metallicity and elemental abundance gradients of simulated galaxies and their environmental dependence. '''2017'''. MNRAS, 471, 3856
* Taylor, P., Federrath, C., Kobayashi, C. Star formation in simulated galaxies: understanding the transition to quiescence at 3x10^10 solar masses. '''2017'''. MNRAS, 469, 4249
*Taylor, P., Kobayashi, C. Time evolution of galaxy scaling relations in cosmological simulations. '''2016'''. MNRAS, 463, 2465
*Taylor, P., Kobayashi, C. Quantifying AGN-driven metal-enhanced outflows in chemodynamical simulations. '''2015'''. MNRAS, 452L, 59
*Taylor, P., Kobayashi, C. The effects of AGN feedback on present-day galaxy properties in cosmological simulations. '''2015'''. MNRAS, 448, 1835
*Taylor, P., Kobayashi, C. Seeding black holes in cosmological simulations. '''2014'''. MNRAS, 442, 2751
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''), Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011''') Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A, Consensus virtual screening approaches to predict protein ligands ('''2011'''), ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
1844071043499bc6b3f64745d85f6e7447907dae
605
604
2019-10-18T08:29:32Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Patel, H., & Kukol, A. ('''2019'''). Prediction of ligands to universally conserved binding sites of the influenza A virus nuclear export protein. Virology, 537, 97-103.
* Taylor, P., Kobayashi, C., Federrath, C. The metallicity and elemental abundance maps of kinematically atypical galaxies for constraining minor merger and accretion histories. '''2019'''. MNRAS, 485, 3215
* Wang, E.X., Taylor, P., Federrath, C., Kobayashi, C. The impact of black hole seeding in cosmological simulations. '''2019'''. MNRAS, 483, 4640
* Taylor, P., Federrath, C., Kobayashi, C. The origin of kinematically distinct cores and misaligned gas discs in galaxies from cosmological simulations. '''2018'''. MNRAS, 479, 141
* Taylor, P., Kobayashi, C. The metallicity and elemental abundance gradients of simulated galaxies and their environmental dependence. '''2017'''. MNRAS, 471, 3856
* Taylor, P., Federrath, C., Kobayashi, C. Star formation in simulated galaxies: understanding the transition to quiescence at 3x10^10 solar masses. '''2017'''. MNRAS, 469, 4249
*Taylor, P., Kobayashi, C. Time evolution of galaxy scaling relations in cosmological simulations. '''2016'''. MNRAS, 463, 2465
*Taylor, P., Kobayashi, C. Quantifying AGN-driven metal-enhanced outflows in chemodynamical simulations. '''2015'''. MNRAS, 452L, 59
*Taylor, P., Kobayashi, C. The effects of AGN feedback on present-day galaxy properties in cosmological simulations. '''2015'''. MNRAS, 448, 1835
*Taylor, P., Kobayashi, C. Seeding black holes in cosmological simulations. '''2014'''. MNRAS, 442, 2751
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''), Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011''') Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A, Consensus virtual screening approaches to predict protein ligands ('''2011'''), ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
13c24bb1d527a21b31c810b89b279fd9c0f20a0d
606
605
2019-10-18T08:31:21Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Patel, H., & Kukol, A. ('''2019'''). Prediction of ligands to universally conserved binding sites of the influenza A virus nuclear export protein. Virology, 537, 97-103.
* Taylor, P., Kobayashi, C., Federrath, C. The metallicity and elemental abundance maps of kinematically atypical galaxies for constraining minor merger and accretion histories. '''2019'''. MNRAS, 485, 3215
* Wang, E.X., Taylor, P., Federrath, C., Kobayashi, C. The impact of black hole seeding in cosmological simulations. '''2019'''. MNRAS, 483, 4640
* Taylor, P., Federrath, C., Kobayashi, C. The origin of kinematically distinct cores and misaligned gas discs in galaxies from cosmological simulations. '''2018'''. MNRAS, 479, 141
* Taylor, P., Kobayashi, C. The metallicity and elemental abundance gradients of simulated galaxies and their environmental dependence. '''2017'''. MNRAS, 471, 3856
* Taylor, P., Federrath, C., Kobayashi, C. Star formation in simulated galaxies: understanding the transition to quiescence at 3x10^10 solar masses. '''2017'''. MNRAS, 469, 4249
*Taylor, P., Kobayashi, C. Time evolution of galaxy scaling relations in cosmological simulations. '''2016'''. MNRAS, 463, 2465
*Taylor, P., Kobayashi, C. Quantifying AGN-driven metal-enhanced outflows in chemodynamical simulations. '''2015'''. MNRAS, 452L, 59
*Taylor, P., Kobayashi, C. The effects of AGN feedback on present-day galaxy properties in cosmological simulations. '''2015'''. MNRAS, 448, 1835
*Taylor, P., Kobayashi, C. Seeding black holes in cosmological simulations. '''2014'''. MNRAS, 442, 2751
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''), Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011''') Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A ('''2011''') Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
7cf98feef2db6b80a67329e24bd976e670d47953
607
606
2019-10-18T08:32:07Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Patel, H., & Kukol, A. ('''2019'''). Prediction of ligands to universally conserved binding sites of the influenza A virus nuclear export protein. Virology, 537, 97-103.
* Taylor, P., Kobayashi, C., Federrath, C. The metallicity and elemental abundance maps of kinematically atypical galaxies for constraining minor merger and accretion histories. '''2019'''. MNRAS, 485, 3215
* Wang, E.X., Taylor, P., Federrath, C., Kobayashi, C. The impact of black hole seeding in cosmological simulations. '''2019'''. MNRAS, 483, 4640
* Taylor, P., Federrath, C., Kobayashi, C. The origin of kinematically distinct cores and misaligned gas discs in galaxies from cosmological simulations. '''2018'''. MNRAS, 479, 141
* Taylor, P., Kobayashi, C. The metallicity and elemental abundance gradients of simulated galaxies and their environmental dependence. '''2017'''. MNRAS, 471, 3856
* Taylor, P., Federrath, C., Kobayashi, C. Star formation in simulated galaxies: understanding the transition to quiescence at 3x10^10 solar masses. '''2017'''. MNRAS, 469, 4249
*Taylor, P., Kobayashi, C. Time evolution of galaxy scaling relations in cosmological simulations. '''2016'''. MNRAS, 463, 2465
*Taylor, P., Kobayashi, C. Quantifying AGN-driven metal-enhanced outflows in chemodynamical simulations. '''2015'''. MNRAS, 452L, 59
*Taylor, P., Kobayashi, C. The effects of AGN feedback on present-day galaxy properties in cosmological simulations. '''2015'''. MNRAS, 448, 1835
*Taylor, P., Kobayashi, C. Seeding black holes in cosmological simulations. '''2014'''. MNRAS, 442, 2751
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''). Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011'''.) Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A ('''2011'''). Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
006df77d674a734de91ff5eb385ad5895328bb10
608
607
2019-10-18T08:33:36Z
Akukol
3
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Patel, H., & Kukol, A. ('''2019'''). Prediction of ligands to universally conserved binding sites of the influenza A virus nuclear export protein. ''Virology'', 537, 97-103.
* Taylor, P., Kobayashi, C., Federrath, C. The metallicity and elemental abundance maps of kinematically atypical galaxies for constraining minor merger and accretion histories. '''2019'''. MNRAS, 485, 3215
* Wang, E.X., Taylor, P., Federrath, C., Kobayashi, C. The impact of black hole seeding in cosmological simulations. '''2019'''. MNRAS, 483, 4640
* Taylor, P., Federrath, C., Kobayashi, C. The origin of kinematically distinct cores and misaligned gas discs in galaxies from cosmological simulations. '''2018'''. MNRAS, 479, 141
* Taylor, P., Kobayashi, C. The metallicity and elemental abundance gradients of simulated galaxies and their environmental dependence. '''2017'''. MNRAS, 471, 3856
* Taylor, P., Federrath, C., Kobayashi, C. Star formation in simulated galaxies: understanding the transition to quiescence at 3x10^10 solar masses. '''2017'''. MNRAS, 469, 4249
*Taylor, P., Kobayashi, C. Time evolution of galaxy scaling relations in cosmological simulations. '''2016'''. MNRAS, 463, 2465
*Taylor, P., Kobayashi, C. Quantifying AGN-driven metal-enhanced outflows in chemodynamical simulations. '''2015'''. MNRAS, 452L, 59
*Taylor, P., Kobayashi, C. The effects of AGN feedback on present-day galaxy properties in cosmological simulations. '''2015'''. MNRAS, 448, 1835
*Taylor, P., Kobayashi, C. Seeding black holes in cosmological simulations. '''2014'''. MNRAS, 442, 2751
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''). Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011'''.) Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A ('''2011'''). Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
e8927305a38de88ae9a8f149b1203c42db7789bd
MPI
0
12
594
310
2019-05-14T19:50:46Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Ethernet network, and so the latency and total available bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/lib64/mpich2/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/lib64/mpich2/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=16:ppn=8 set, as in the example above, there will be 16*8=128 MPI processes (8 per node).
=== MVAPICH2 ===
MVAPICH2 uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available. This does not work on the smp machines (hardware incompatible), use e.g. MPICH2 there instead or use the main queue.
To use MVAPICH2 do
<pre>
module unload mpi/mpich-x86_64
module load mvapich2
</pre>
Then you should see
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-2.1a/bin/mpicc
</pre>
<tt></usr/mpi/gcc/mvapich2-2.1a/bin/mpiexec/tt> works for starting MVAPICH2 jobs:
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/mpi/gcc/mvapich2-2.1a/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== OpenMPI ===
<tt>module load mpi/openmpi-x86_64</tt> and then proceed as above.
60b09cf46f96e7dd812d0a15174a6bf84cacb64a
595
594
2019-05-14T19:52:06Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster. All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All use of MPI on the cluster must be integrated with the [[jobs|job control system]] and this page describes how to do that. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
=== MPICH2 ===
MPICH2 is the default implementation since it is provided as a package within Fedora. If you take no action, your MPI code will be compiled with MPICH2. Check which implementation you're using like this:
<pre>
> which mpicc
/usr/lib64/mpich2/bin/mpicc
</pre>
MPICH2 is '''not''' Infiniband-aware. The communication between processes will use the Ethernet network, and so the latency and total available bandwidth of IPC will be about an order of magnitude worse than it would be with Infiniband-aware implementations (see below).
To run MPICH2 jobs, your job control system script should call <tt>/usr/lib64/mpich2/bin/mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/lib64/mpich2/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key line is the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=16:ppn=8 set, as in the example above, there will be 16*8=128 MPI processes (8 per node).
=== MVAPICH2 ===
MVAPICH2 uses native Infiniband system calls to communicate. This means that the latency of IPC can be as low as a few microseconds and the full bandwidth of the Infiniband interfaces is available. This does not work on the smp machines (hardware incompatible), use e.g. MPICH2 there instead or use the main queue.
To use MVAPICH2 do
<pre>
module unload mpi/mpich-x86_64
module load mvapich2
</pre>
Then you should see
<pre>
> which mpicc
/usr/mpi/gcc/mvapich2-2.1a/bin/mpicc
</pre>
<tt></usr/mpi/gcc/mvapich2-2.1a/bin/mpiexec/tt> works for starting MVAPICH2 jobs:
<pre>
#!/bin/sh -f
#PBS -N mvapich2-demo
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
/usr/mpi/gcc/mvapich2-2.1a/bin/mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
=== OpenMPI ===
This should also be infiniband-aware.
<pre>module load mpi/openmpi-x86_64</pre> and then proceed as above.
62c480ef28be291fe9f05a474add53a61d4e5800
GPUs
0
71
596
513
2019-06-17T20:06:05Z
Mjh
2
wikitext
text/x-wiki
Several machines on the cluster have attached NVIDIA GPUs.
* gpu1 is the main cluster gpu machine. It is currently the only machine in the <tt>gpu</tt> queue. The attached GPUs are 6 Tesla K80 units.
* ramius has a single Tesla K40c. ramius is a private machine.
The NVIDIA CUDA software is installed in '/soft/cuda-VERSION' where VERSION is the version number of CUDA. You can do e.g. <tt>module load cuda-10.0</tt> to make sure the libraries and compilers are on your path.
Note:
* At the moment, there is no provision for preventing contention between users of the Tesla unit. That is, if more than one user runs a job that tries to use it at one time, there may be contention and we don't know what effect this will have.
* Equally, there is no provision for preventing contention on the host side -- if you only ask for a few CPU cores and another CPU-bound job takes the remainder, this may (or may not) interfere with your job.
The easiest way to overcome any issues of contention would be to ask for all cores of the host machine, even if you're not going to use them all: then the job submission system will not allow any other jobs to run on the host, so you have exclusive access to the GPU.
It may be sensible to bind your host-side process to cores physically in sockets that are connected to the PCI bus using the Linux process affinity setting commands. This will depend on your application --- left up to users at the moment.
The current arrangement in principle allows non-GPU jobs running on a machine with a GPU to block CUDA jobs. If this becomes a problem, we will review the queueing arrangements.
== Tensorflow ==
Tensorflow packages are available under python3.6. Make sure CUDA 10.0 is loaded (see above) and <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt> is on your PYTHONPATH.
== Via OpenGL context ==
It is possible for an application to access an OpenGL context on the GPUs using the provided 'headless' X server configuration.
* User needs to start X server:
<pre>
X :42 &
</pre>
where "42" is a free display number, assuming "42" is free. If a number is chosen that is not free, an error message will appear.
* Set the DISPLAY environment variable:
<pre>
export DISPLAY=:42.0
</pre>
where "42" is the free display number as used for starting the X server, assuming the syntax of the BASH shell and .0 denotes the first screen (there are currently four - but using more than one is untested).
* start application, which will need to request an OpenGL context, make use of it, output results unattended and quit after it is done (a render job for example).
Submission of such jobs through the job queue is also untested, so needs to be discussed with the administrators first.
594247ee14f7c73df15d65136f010b222bc3db4d
610
596
2019-12-12T14:22:07Z
Mjh
2
wikitext
text/x-wiki
Several machines on the cluster have attached NVIDIA GPUs.
* gpu1: The attached GPUs are 6 Tesla K80 units with 16GB VRAM.
* gpu2 and gpu3: These both have 2 Tesla V100 units with 16 GB VRAM.
* ramius has a single Tesla K40c.
ramius is a private machine, the other machines are accessible through the gpu queue.
The NVIDIA CUDA software is installed in '/soft/cuda-VERSION' where VERSION is the version number of CUDA. You can do e.g. <tt>module load cuda-10.0</tt> to make sure the libraries and compilers are on your path.
Note:
* At the moment, there is no provision for preventing contention between users of the Tesla unit. That is, if more than one user runs a job that tries to use it at one time, there may be contention and we don't know what effect this will have.
* Equally, there is no provision for preventing contention on the host side -- if you only ask for a few CPU cores and another CPU-bound job takes the remainder, this may (or may not) interfere with your job.
The easiest way to overcome any issues of contention would be to ask for all cores of the host machine, even if you're not going to use them all: then the job submission system will not allow any other jobs to run on the host, so you have exclusive access to the GPU.
It may be sensible to bind your host-side process to cores physically in sockets that are connected to the PCI bus using the Linux process affinity setting commands. This will depend on your application --- left up to users at the moment.
The current arrangement in principle allows non-GPU jobs running on a machine with a GPU to block CUDA jobs. If this becomes a problem, we will review the queueing arrangements.
== Tensorflow ==
Tensorflow packages are available under python3.6. Make sure CUDA 10.0 is loaded (see above) and <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt> is on your PYTHONPATH.
== Via OpenGL context ==
It is possible for an application to access an OpenGL context on the GPUs using the provided 'headless' X server configuration.
* User needs to start X server:
<pre>
X :42 &
</pre>
where "42" is a free display number, assuming "42" is free. If a number is chosen that is not free, an error message will appear.
* Set the DISPLAY environment variable:
<pre>
export DISPLAY=:42.0
</pre>
where "42" is the free display number as used for starting the X server, assuming the syntax of the BASH shell and .0 denotes the first screen (there are currently four - but using more than one is untested).
* start application, which will need to request an OpenGL context, make use of it, output results unattended and quit after it is done (a render job for example).
Submission of such jobs through the job queue is also untested, so needs to be discussed with the administrators first.
95d545d678f3fbc13d63fc3418973d8ca5bffe53
Storage
0
8
597
565
2019-07-03T06:35:42Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 830 TB of beegfs storage nominally distributed as follows:
* 360 TB: general use, under /beegfs/general
* 145 TB: CAR, under /beegfs/car
* 231 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
11f3ea4f95581d5c118a2692a3afb375a77e20c5
598
597
2019-07-19T10:30:34Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 951 TB of beegfs storage nominally distributed as follows:
* 485 TB: general use, under /beegfs/general
* 145 TB: CAR, under /beegfs/car
* 231 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
9a5fa8cfd99bdf0c3a3ff338738ecb69cd600786
611
598
2020-01-20T12:25:05Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 1.1 PB of beegfs storage nominally distributed as follows:
* 485 TB: general use, under /beegfs/general
* 272 TB: CAR, under /beegfs/car
* 231 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
2f31da734eae4053a433b38cbe4327ac83cb01b6
Administrators
0
6
599
550
2019-08-02T14:39:46Z
Mjh
2
wikitext
text/x-wiki
== Administrators ==
UH users wanting help or support with UHHPC should ask for it through the UH helpdesk (e-mail helpdesk@herts.ac.uk or log in to [https://helpdesk.herts.ac.uk/ the web interface]). This includes a request for initial account creation.
External users, such as external consortium users, should send e-mail to Martin Hardcastle (m.j.hardcastle@herts.ac.uk).
923e5226768800a85466a2d29cfd98b868291d3e
600
599
2019-08-02T14:40:40Z
Mjh
2
/* Administrators */
wikitext
text/x-wiki
== Administrators ==
UH users wanting help or support with UHHPC should ask for it through the UH helpdesk (e-mail helpdesk@herts.ac.uk or log in to [https://helpdesk.herts.ac.uk/ the web interface]). This includes a request for initial account creation. Please make sure that your e-mail or help request includes 'UHHPC' in the subject line to make sure that it is routed through to the correct team.
External users, such as external consortium users, should send e-mail to Martin Hardcastle (m.j.hardcastle@herts.ac.uk).
df5c9679e378ac46b8a7cfa84dae10eb85387c5b
601
600
2019-08-05T17:25:13Z
Mjh
2
wikitext
text/x-wiki
== Administrators ==
UH users wanting help or support with UHHPC should ask for it through the UH helpdesk (e-mail helpdesk@herts.ac.uk or log in to [https://helpdesk.herts.ac.uk/ the web interface]). This includes a request for initial account creation. Please make sure that your e-mail or help request includes 'UHHPC' in the subject line to make sure that it is routed through to the correct team. If you are asking for account creation, it will save time if you mention that you accept the [[terms of use]].
External users, such as external consortium users, should send e-mail to Martin Hardcastle (m.j.hardcastle@herts.ac.uk).
6da2c60dc649b9bddb9712ed6b9b831551e3f1f4
Shell
0
80
602
2019-09-20T06:01:36Z
Mjh
2
Created page with "For historical reasons, the default shell on UHHPC is tcsh. The chsh command does not work on the cluster. If you wish to switch to bash, you must ask the [[administrators]]..."
wikitext
text/x-wiki
For historical reasons, the default shell on UHHPC is tcsh.
The chsh command does not work on the cluster. If you wish to switch to bash, you must ask the [[administrators]] to make the change.
f87749be9b11c162c35af6ad2ba333df6800d9b8
LOFAR-UK Compute Facility
0
57
609
536
2019-10-22T15:34:10Z
Mjh
2
wikitext
text/x-wiki
The LOFAR-UK compute facility is a part of the UH HPC facility reserved for LOFAR-UK use. It consists of a dedicated login machine and file server with 256 GB RAM and 16 cores, one dedicated compute node with 256 GB RAM and 32 cores, plus competitive access to the machines in the main cluster. If a big data analysis task is planned, a reservation can be made to ensure unrestricted access to computing power.
LOFAR-UK users can get an account by contacting Martin Hardcastle. This will give them the ability to log in to <tt>lofar.herts.ac.uk</tt>. Data can be downloaded to the dedicated area <tt>/beegfs/lofar</tt>.
Data processing should generally be carried out on the compute nodes rather than on lofar.herts.ac.uk. You can do your analysis by running interactive or non-interactive [[jobs]], as appropriate.
See the [[LOFAR]] page for information on LOFAR software.
A description of the [[generic pipeline]] is available.
241e6960cc4b92cb245eb865ec1034df5900c4a1
Software
0
17
614
561
2020-03-27T11:24:29Z
Mjh
2
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[CUDA]]: see [[GPU machines]]
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* <u>[[Gromacs]]</u>: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
b2326246c25869672449c34bbd1ae4c1bed62264
615
614
2020-03-27T11:28:45Z
Mjh
2
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* <u>[[Gromacs]]</u>: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
01a80beb2aea399e4ff37e38d991ebd2f1afcdeb
616
615
2020-03-27T11:30:13Z
Mjh
2
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* <u>[[Gromacs]]</u>: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
21beed65a52fc17ed27140c4f0a3fe409774ff48
619
616
2020-03-29T14:05:29Z
H.patel
14
/* Molecular dynamics */
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* [[NAMD]]: in <tt>/soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA
* <u>[[Gromacs]]</u>: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
c398b8ba7ac9acc9a5dc3e7352c4ee19b189cf59
User:Mayaahorton
2
81
617
2020-03-27T13:45:33Z
Mjh
2
Creating user page for new user.
wikitext
text/x-wiki
Postgrad in computational astrophysics, working part-time on UHHPC support.
5c02a2b9df9c753269c3c9586e3a274bfc033078
User talk:Mayaahorton
3
82
618
2020-03-27T13:45:33Z
Mjh
2
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents help pages].
Again, welcome and have fun! [[User:Mjh|Mjh]] ([[User talk:Mjh|talk]]) 13:45, 27 March 2020 (UTC)
5011cc87da6369a07d18c77e7bda9efcd46acd3c
NAMD
0
83
620
2020-03-29T14:16:22Z
H.patel
14
Created page with "NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Enhanced sampling techniques such as accelerated MD simulati..."
wikitext
text/x-wiki
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Enhanced sampling techniques such as accelerated MD simulations can be performed using NAMD.
To run NAMD, a shell script needs to be prepared as shown below and edited according to your needs. Submit the job to the cluster with the qsub command, e.g. qsub runjobNAMD.sh
<i>Hershna Patel<i>
752ea7dc63192eca2cc70bf8946c1b1704fd6010
621
620
2020-03-29T14:24:49Z
H.patel
14
wikitext
text/x-wiki
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Enhanced sampling techniques such as accelerated MD simulations can be performed using NAMD.
To run NAMD, a shell script needs to be prepared as shown below and edited according to your needs. Submit the job to the cluster with the qsub command, e.g. qsub runjobNAMD.sh
<i>Hershna Patel<i>
________________________________________________________________________________________
<pre>#!/bin/sh
#PBS -N NamdTest
#PBS -q gpu1
#PBS -l nodes=1:ppn=16
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -u hpatel
# -N runs a job with name 'NamdTest' on the gpu machine on the cluster
# -q job starts up on gpu1
# -l set a maximum time of forty eight hours (wall-clock time)
# -j merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# -u specifies user 'hpatel'
# set required path:
source /soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA
# specify working directory:
cd /home/hpatel/...
### This is the command ###
./namd2 +idlepoll +p16 configfile.namd > output.log
### command end ###
# start with 'qsub runjobNAMD.sh'
6104771a050868ef7245a99909e5ee0c9a862660
622
621
2020-03-29T14:25:36Z
H.patel
14
wikitext
text/x-wiki
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Enhanced sampling techniques such as accelerated MD simulations can be performed using NAMD.
To run NAMD, a shell script needs to be prepared as shown below and edited according to your needs. Submit the job to the cluster with the qsub command, e.g. qsub runjobNAMD.sh
<i>Hershna Patel<i>
<pre>#!/bin/sh
#PBS -N NamdTest
#PBS -q gpu1
#PBS -l nodes=1:ppn=16
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -u hpatel
# -N runs a job with name 'NamdTest' on the gpu machine on the cluster
# -q job starts up on gpu1
# -l set a maximum time of forty eight hours (wall-clock time)
# -j merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# -u specifies user 'hpatel'
# set required path:
source /soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA
# specify working directory:
cd /home/hpatel/...
### This is the command ###
./namd2 +idlepoll +p16 configfile.namd > output.log
### command end ###
# start with 'qsub runjobNAMD.sh'
fdbe36965a58ab312e299c7c768b48ef029fe417
623
622
2020-03-29T14:27:59Z
H.patel
14
wikitext
text/x-wiki
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Enhanced sampling techniques such as accelerated MD simulations can be performed using NAMD.
To run NAMD v2.13, a shell script needs to be prepared as shown below and edited according to your needs. Submit the job to the cluster with the qsub command, e.g. qsub runjobNAMD.sh
<i>Hershna Patel<i>
<pre>#!/bin/sh
#PBS -N NamdTest
#PBS -q gpu1
#PBS -l nodes=1:ppn=16
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -u hpatel
# -N runs a job with name 'NamdTest' on the gpu machine on the cluster
# -q job starts up on gpu1
# -l set a maximum time of forty eight hours (wall-clock time)
# -j merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# -u specifies user 'hpatel'
# set required path:
source /soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA
# specify working directory:
cd /home/hpatel/...
### This is the command ###
./namd2 +idlepoll +p16 configfile.namd > output.log
### command end ###
# start with 'qsub runjobNAMD.sh'
579aa47aee79dabf4e29ce5785982c211e5342a2
R
0
84
624
2020-04-01T13:02:53Z
Mjh
2
Created page with "R is installed on head node and compute node machines by default. To use system-wide installations of packages set <tt>R_LIBS</tt> to <tt>/soft/R</tt>."
wikitext
text/x-wiki
R is installed on head node and compute node machines by default.
To use system-wide installations of packages set <tt>R_LIBS</tt> to <tt>/soft/R</tt>.
7b2b3fad4f8142b6445ecbabdcfa9f6aa1bdd04b
Jobs
0
9
625
543
2020-04-10T10:12:31Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://docs.adaptivecomputing.com/torque/help.htm Torque] (formerly known as PBS) and [http://docs.adaptivecomputing.com/maui/index.php Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/qsub.htm online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
* <tt>file</tt>: the [[local disk space]] to be used by your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=64 job.sh ## run for two hours on 64 CPUs (which may be spread over up to 64 physical nodes)
qsub -l walltime=1:0:0 -l nodes=2:ppn=32 job.sh ## run for one hour on 2 nodes with 32 CPUs per node
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes on one node for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. Do not attempt to use these options unless you know what you are doing.
Note also that in the present configuration requests for fewer processors per node than are available may be consolidated onto fewer nodes. This should never be a problem unless you are specifically testing inter-node communication, in which case you could specify node names to force running on different nodes. For efficiency, you are recommended always to ask for and use the maximum number of processes per node for MPI or multi-threaded code -- that is, 32 processes for the main cluster. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
You can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qalter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>. Look at the <tt>man</tt> pages for these commands for more information.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/2-jobs/exportedBatchEnvVar.htm here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/pbsdsh.htm pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
setenv OMP_NUM_THREADS `cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
For a large array you have the option to limit the number of jobs that will run concurrently -- perhaps because they all want access to IO resources and will compete with each other and run out of walltime if they all run at once. So
<pre>
qsub -t 1-1000%20 myjob.qsub
</pre>
will run 1000 versions of the job but ensure that only 20 are running at a given time.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
5282c269bedac59ea7b0a6888a06856682992eb2
Architecture
0
7
626
584
2020-04-10T10:13:16Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis6-9 are populated. The newer compute nodes, which will be used by most people, are in three racks, rack1-3. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2, rack3), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* dstorage1-10, file servers providing 830 TB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
3ade3eb3d24dd126324ea0b58bb2a315675ec0d0
627
626
2020-04-10T10:13:42Z
Mjh
2
/* Servers and dedicated login nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis6-9 are populated. The newer compute nodes, which will be used by most people, are in three racks, rack1-3. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2, rack3), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4).
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* metadata and dstorage1-12, file servers providing 830 TB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
1efd3e1434d7e0aeb888bf0df29b58adeea2fe6d
628
627
2020-04-10T10:14:20Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis6-9 are populated. The newer compute nodes, which will be used by most people, are in three racks, rack1-3. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2, rack3), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4). smp4 is dedicated to LOFAR-UK use.
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* metadata and dstorage1-12, file servers providing 830 TB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
4b90ca254c302a344c61b649ce12df8b17962627
629
628
2020-04-10T10:14:34Z
Mjh
2
/* Servers and dedicated login nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis6-9 are populated. The newer compute nodes, which will be used by most people, are in three racks, rack1-3. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2, rack3), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4). smp4 is dedicated to LOFAR-UK use.
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* A [[GPUs|GPU]] machine, gpu1, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* metadata and dstorage1-12, file servers providing 1.1 PB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
e1fa733580142d2c5a27ff32ab78efc53e9810bf
630
629
2020-04-10T10:15:16Z
Mjh
2
/* compute nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis6-9 are populated. The newer compute nodes, which will be used by most people, are in three racks, rack1-3. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2, rack3), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form part of the CAR cluster (node097-node112: chassis 7), in the car queue
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the rest of the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4). smp4 is dedicated to LOFAR-UK use.
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* Three [[GPUs|GPU]] machines, gpu1-3, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* metadata and dstorage1-12, file servers providing 1.1 PB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
33c2a5d3aa11266bdc18d0ab440eb687b4d8c4c8
654
630
2020-09-30T18:14:00Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 2 head nodes: headnode1 and headnode2, each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
Older compute nodes are in 16-blade chassis: currently chassis6-9 are populated. The newer compute nodes, which will be used by most people, are in three racks, rack1-3. The smp machines are outside any particular rack or chassis.
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband form part of the Main cluster (node001-080: rack1, rack2, rack3), in the main queue
* 12 Xeons (X5650s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAIR cluster (node081-92: chassis 6), in the cair_l queue
* 2 Xeons (E5520s) 2 socket x 4-core (no hyperthreading) with 24 GB RAM and DDR Infiniband form part of the main cluster (node093-94: chassis 6), in the main queue
* 16 AMD EPYC 7452 2 socket x 32-core (no hyperthreading) with 256 GB RAM and FDR Infiniband form part of the Main cluster (node097-112: rack4) in the test queue.
* 16 Xeons (X5675s) 2 socket x 6-core (no hyperthreading) with 24 GB RAM and QDR Infiniband form the CAR cluster (node113-128: chassis 8), in the car queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband form the rest of the main cluster (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[SMP machines]], two with 48 cores (4 sockets x 12 cores, Opteron 6174) and two with 32 (one 4 sockets x 8 cores, Xeon 4620), all with 256 GB RAM and QDR Infiniband (smp1-4). smp4 is dedicated to LOFAR-UK use.
* Two SMP blades in chassis6 (node095-096) with 32 cores (2 sockets x 16 cores) and 256 GB RAM, in the smp queue
* Three [[GPUs|GPU]] machines, gpu1-3, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* metadata and dstorage1-12, file servers providing 1.1 PB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
7610cc6fc7ac1c275f0cce5e0a3f7d0f8029e5fa
Ciao
0
66
631
436
2020-04-15T17:52:22Z
Mjh
2
wikitext
text/x-wiki
CIAO is the Chandra data reduction software.
Access it by doing <tt>source /soft/ciao-4.12/ciao-4.12/bin/ciao.csh</tt>
dce835a76a9e460bf5b91419222e7a457f5c01c9
Software
0
17
632
619
2020-04-15T17:53:14Z
Mjh
2
/* Astronomical software */
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[SAS]]: in <tt>/soft/xmmsas_xxxxxx</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* [[NAMD]]: in <tt>/soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA
* <u>[[Gromacs]]</u>: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
d3c94d11ed660e21a4ca525668bd5030d72e9261
640
632
2020-06-10T17:38:34Z
Mjh
2
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[SAS]]: in <tt>/soft/xmmsas_xxxxxx</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* [[NAMD]]: in <tt>/soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA
* <u>[[Gromacs]]</u>: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
b555f59de096091d54e551e6ae7aa15a537acb24
641
640
2020-06-10T17:38:46Z
Mjh
2
/* Astronomical software */
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[SAS]]: in <tt>/soft/xmmsas_xxxxxx</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* [[NAMD]]: in <tt>/soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA
* <u>[[Gromacs]]</u>: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
a83230b44600bc39ab65a8502ca644353ac2df6b
642
641
2020-06-10T17:39:29Z
Mjh
2
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[SAS]]: in <tt>/soft/xmmsas_xxxxxx</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* [[NAMD]]: in <tt>/soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA</tt>
* [[Gromacs]]: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
76ccae74b69e46d6aa4a72061e54dcfb298026c5
SAS
0
85
633
2020-04-15T20:23:01Z
Mjh
2
Created page with "SAS is the XMM data reduction software. To run SAS you must first have HEADAS set up. In bash <pre> export HEADAS=/soft/heasoft-6.27.1/x86_64-pc-linux-gnu-libc2.17 . $HEADAS..."
wikitext
text/x-wiki
SAS is the XMM data reduction software.
To run SAS you must first have HEADAS set up.
In bash
<pre>
export HEADAS=/soft/heasoft-6.27.1/x86_64-pc-linux-gnu-libc2.17
. $HEADAS/headas-init.sh
</pre>
In tcsh
<pre>
setenv HEADAS /soft/heasoft-6.27.1/x86_64-pc-linux-gnu-libc2.17
source $HEADAS/headas-init.csh
</pre>
Then source SAS:
<pre>
source /soft/xmmsas_20190531_1155/setsas.sh
</pre>
or
<pre>
source /soft/xmmsas_20190531_1155/setsas.csh
</pre>
Set the <tt>SAS_CCFPATH</tt> variable to <tt>/beegfs/car/XMM/CCF</tt>.
4cc37cf948f7a85c531bad24dbe008e4f9045009
Known problems
0
25
634
484
2020-04-24T08:57:08Z
Mjh
2
wikitext
text/x-wiki
== Known problems ==
* Although the Infiniband links carry NFS traffic between the nodes and the head node, they are not using RDMA and so their latency is higher and bandwidth lower than it should be. This seems to be a problem with the Linux kernels on the nodes; NFS over RDMA+infiniband is unstable. Less of an issue now most IO goes to /beegfs which does use RDMA.
* The scheduler sometimes crashes for unknown reasons causing jobs not to run. (Regularly run scripts check and restart the scheduler.)
* The scheduler very occasionally will not run a job that could be run immediately in free resources. If <tt>checkjob</tt> shows 'job can run' and <tt>showstart</tt> shows an immediate start, but your job does not run, then please contact the [[administrators]].
* Node specifications of the form <tt>nodes=main:ppn=16</tt> or <tt>nodes=smp:ppn=1</tt> will severely confuse the scheduler, although they are valid. Please do not use queue names in node specifications: always do something like <tt>-q main -l nodes=1:ppn=16</tt> instead.
== Node hardware/sw (for admin use only) ==
* node087, node093, node103: hardware failures
* node143 -- unstable clock issue
a320759c59d2ede4fd391e69477f4bc13e666d4e
Shell
0
80
635
602
2020-05-01T11:17:16Z
Mjh
2
wikitext
text/x-wiki
For historical reasons, the default shell on UHHPC is tcsh.
The chsh command does not work on the cluster. If you wish to switch to bash, you must ask the [[administrators]] to make the change.
There is a bug in tcsh history processing which means that the .history file may become corrupt after you have run many jobs. To fix this temporarily, remove your .history file. To fix it permanently (at the cost of no longer having persistent history), do
<pre>
cd
rm .history
ln -s /dev/null .history
</pre>
bash does not suffer from this problem.
481d8a6e61d51848a74a969c52cb1b3012912d4d
Python packages
0
49
636
567
2020-05-13T12:51:44Z
Mjh
2
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* astropy [http://pypi.python.org/packages/source/a/astropy/astropy-0.2.4.tar.gz]
* kapteyn [http://www.astro.rug.nl/software/kapteyn/]
* h5py
* mpi4py
* hcluster
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these available.
Please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 assumed):
<pre>
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip.
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-10.1
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
0e49bacca683d0216fa7706a277fc0fd67879e1b
637
636
2020-05-13T12:54:41Z
Mjh
2
/* Python virtual environments */
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* astropy [http://pypi.python.org/packages/source/a/astropy/astropy-0.2.4.tar.gz]
* kapteyn [http://www.astro.rug.nl/software/kapteyn/]
* h5py
* mpi4py
* hcluster
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these available.
Please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 assumed):
<pre>
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
This creates a directory py3-venv which will be used for your own private installation of packages. In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip.
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-10.1
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
e273ce01e09f2a4a17b77ca824aa5e40b5bd3b61
638
637
2020-05-13T12:56:12Z
Mjh
2
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* astropy [http://pypi.python.org/packages/source/a/astropy/astropy-0.2.4.tar.gz]
* kapteyn [http://www.astro.rug.nl/software/kapteyn/]
* h5py
* mpi4py
* hcluster
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these available.
Please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 assumed):
<pre>
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
This creates a directory py3-venv which will be used for your own private installation of packages. In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip.
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-10.1
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
Note that using this method means you will also have to install any other modules you need, such as matplotlib, using pip install.
b0313cfb7475d922a20266f77d7ff95cf7db0935
639
638
2020-06-02T07:49:19Z
Mjh
2
/* Python 3.6 */
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* astropy [http://pypi.python.org/packages/source/a/astropy/astropy-0.2.4.tar.gz]
* kapteyn [http://www.astro.rug.nl/software/kapteyn/]
* h5py
* mpi4py
* hcluster
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these available.
Please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to include <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>, and make sure no Python 2 packages are on your PYTHONPATH.
If you want Ipython3 add <tt>/soft/python3/usr/local/bin</tt> to your PATH.
<tt>module load python3</tt> will make these changes for you.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 assumed):
<pre>
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
This creates a directory py3-venv which will be used for your own private installation of packages. In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip.
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-10.1
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
Note that using this method means you will also have to install any other modules you need, such as matplotlib, using pip install.
6e9f66d4acacb5707bd1ae197b4503be9ce7892b
656
639
2020-10-07T08:53:10Z
Mjh
2
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* numpy
* scipy
* astropy
* tensorflow
* h5py
* mpi4py
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these and many more available.
You can install your own local copies of packages that don't exist on the system using <tt>pip install --user PACKAGENAME</tt>. These will appear in your <tt>.local</tt> directory.
However, please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to include <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>, and make sure no Python 2 packages are on your PYTHONPATH.
If you want Ipython3 add <tt>/soft/python3/usr/local/bin</tt> to your PATH.
<tt>module load python3</tt> will make these changes for you.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 assumed):
<pre>
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
This creates a directory py3-venv which will be used for your own private installation of packages. In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip.
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-10.1
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
Note that using this method means you will also have to install any other modules you need, such as matplotlib, using pip install.
5dd8852b8d47141fb4b77430e01f41f851aa6045
657
656
2020-10-07T08:53:55Z
Mjh
2
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* numpy
* scipy
* astropy
* tensorflow
* h5py
* mpi4py
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these and many more available.
You can install your own local copies of packages that don't exist on the system using <tt>pip install --user PACKAGENAME</tt>. These will appear in your <tt>.local</tt> directory.
However, please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to include <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>, and make sure no Python 2 packages are on your PYTHONPATH.
If you want Ipython3 add <tt>/soft/python3/usr/local/bin</tt> to your PATH.
<tt>module load python3</tt> will make these changes for you.
<tt>pip3</tt> can be used to install local copies of python3 packages.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 assumed):
<pre>
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
This creates a directory py3-venv which will be used for your own private installation of packages. In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip.
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-10.1
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
Note that using this method means you will also have to install any other modules you need, such as matplotlib, using pip install.
0ab8b00b2e6de65e79bda8e1fabae2fd98be98d3
658
657
2020-10-07T08:54:25Z
Mjh
2
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* numpy
* scipy
* astropy
* tensorflow
* h5py
* mpi4py
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these and many more available.
You can install your own local copies of packages that don't exist on the system using <tt>pip install --user PACKAGENAME</tt>. These will appear in your <tt>.local</tt> directory.
However, please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to include <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>, and make sure no Python 2 packages are on your PYTHONPATH.
If you want Ipython3 add <tt>/soft/python3/usr/local/bin</tt> to your PATH.
<tt>module load python3</tt> will make these changes for you.
<tt>pip3</tt> can be used to install local copies of python3 packages.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 assumed):
<pre>
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
This creates a directory py3-venv which will be used for your own private installation of packages. In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip (without <tt>--user</tt> option).
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-10.1
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
Note that using this method means you will also have to install any other modules you need, such as matplotlib, using pip install.
930ca907503b21db125e00135a10184073125903
Modules
0
33
643
193
2020-06-16T20:37:38Z
Mjh
2
wikitext
text/x-wiki
The cluster uses the <tt>environment-modules</tt> package to manage environment variable settings for software packages. When a 'module' is loaded, the appropriate environment settings are added to the start of your PATH, MANPATH, LD_LIBRARY_PATH etc, and other variables used by the relevant package may be set as well. When a module is unloaded, the changes made to environment variables are undone.
Documentation of this package is available at this link[http://modules.sourceforge.net/], or type <tt>man module</tt>
Basic commands include:
* <tt>module list</tt>. See what modules you have loaded.
* <tt>module avail</tt>. List what modules are available to you.
* <tt>module load [modulename]</tt>. Load a module, e.g. <tt>module load mpich2-local</tt>
* <tt>module unload [modulename]</tt>. Unload a module.
* <tt>module show [modulename]</tt>. Show what loading a module does.
See <tt>module list</tt> for a list of currently available modules.
You may use <tt>module</tt> commands in your .bashrc or .cshrc. For example, I have
<pre>
module unload mpich2-x86_64
module load mpich2-local
</pre>
as the first two lines of my .cshrc.
Module commands do not work in job scripts or scripts run by jobs because the relevant aliases are only set up by login shells. This means to get the effect of loading a module you should either manually set environment variables as described in <tt>module show</tt> or do
<pre>
eval `/usr/bin/modulecmd [shell] load [module]`
</pre>
where <tt>[shell]</tt> is the name of the shell you are using.
We are happy to add other environments as modules -- please contact the cluster [[Administrators]].
9f861f8f6bd454406f949d4b4ad342ec090067d9
653
643
2020-09-30T18:11:43Z
Mjh
2
wikitext
text/x-wiki
The cluster uses the <tt>environment-modules</tt> package to manage environment variable settings for software packages. When a 'module' is loaded, the appropriate environment settings are added to the start of your PATH, MANPATH, LD_LIBRARY_PATH etc, and other variables used by the relevant package may be set as well. When a module is unloaded, the changes made to environment variables are undone.
Documentation of this package is available at this link[http://modules.sourceforge.net/], or type <tt>man module</tt>
Basic commands include:
* <tt>module list</tt>. See what modules you have loaded.
* <tt>module avail</tt>. List what modules are available to you.
* <tt>module load [modulename]</tt>. Load a module, e.g. <tt>module load mpich2-local</tt>
* <tt>module unload [modulename]</tt>. Unload a module.
* <tt>module show [modulename]</tt>. Show what loading a module does.
See <tt>module list</tt> for a list of currently available modules.
You may use <tt>module</tt> commands in your .bashrc or .cshrc, e.g. to select your preferred [[MPI]] environment.
Module commands do not work in job scripts or scripts run by jobs because the relevant aliases are only set up by login shells. This means to get the effect of loading a module you should either manually set environment variables as described in <tt>module show</tt> or do
<pre>
eval `/usr/bin/modulecmd [shell] load [module]`
</pre>
where <tt>[shell]</tt> is the name of the shell you are using.
We are happy to add other environments as modules -- please contact the cluster [[Administrators]].
d03d2f02424fc8fa14148e1f4b3745cf1b952f38
To do
0
78
644
590
2020-09-19T08:57:46Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
== Notes ==
To update using lifecycle controller:
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
b267d796940c199dab55d2613f8c6b8ca07375e4
645
644
2020-09-19T09:05:39Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
18b65e3830cd4d71e802f9c08cc4929988a18d5c
664
645
2021-01-22T09:19:48Z
Mjh
2
/* To do next downtime */
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* drive firmware upgrade on dstorage13
* rename test queue to large and think about min job size
* restart IB switch for node025
* install dstorage14 (-:
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
d914495373f84be81eef0f5d40a5a8d3f6045de9
665
664
2021-01-27T09:36:13Z
Mjh
2
/* To do next downtime */
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* drive firmware upgrade on dstorage13
* rename test queue to large and think about min job size
* restart IB switch for node025
* install dstorage14 (-:
* get three chassis6 nodes working
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
b151b78973950f822952e07fce07c68041919845
666
665
2021-02-03T17:22:04Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* drive firmware upgrade on dstorage13
* rename test queue to large and think about min job size
* restart IB switch for node025
* install dstorage14 (-:
* get three chassis6 nodes working
* reboot lofar and other head nodes for security/stability
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
2d7656d756d8075b522dca4029441692e7125591
667
666
2021-02-03T20:01:48Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* drive firmware upgrade on dstorage13
* rename test queue to large and think about min job size
* restart IB switch for node025
* sort infiniband on dstorage14
* get three chassis6 nodes working
* reboot lofar and other head nodes for security/stability
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
4cd20d1682306a2626dc8c343a2678eb407b7821
668
667
2021-02-08T20:05:03Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* drive firmware upgrade on dstorage13
* rename test queue to large and think about min job size
* restart IB switch for node025
* sort infiniband on dstorage14
* get three chassis6 nodes working
* reboot lofar and other head nodes for security/stability
* sort out network speed of em4 on head.data
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
7db4cfaa0ad97bc9ca1ea3295742c10590dd176f
669
668
2021-02-12T12:08:39Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* drive firmware upgrade on dstorage13 [DONE]
* rename test queue to large and think about min job size
* restart IB switch for node025
* sort infiniband on dstorage14
* get three chassis6 nodes working
* reboot lofar and other head nodes for security/stability
* sort out network speed of em4 on head.data
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
d30d94e61c37b799947d2865bc6382491828899d
670
669
2021-02-12T13:06:45Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* drive firmware upgrade on dstorage13 [DONE]
* rename test queue to large and think about min job size
* restart IB switch for node025
* sort infiniband on dstorage14 [DONE]
* get three chassis6 nodes working
* reboot lofar and other head nodes for security/stability
* sort out network speed of em4 on head.data [DONE]
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
d018f7b0433867031c9080faa6749d6edf8a7dd9
671
670
2021-02-12T14:30:50Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* drive firmware upgrade on dstorage13 [DONE]
* rename test queue to large and think about min job size [DONE]
* restart IB switch for node025 [DONE]
* sort infiniband on dstorage14 [DONE]
* get three chassis6 nodes working
* reboot lofar and other head nodes for security/stability [DONE]
* sort out network speed of em4 on head.data [DONE]
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
a49bf406b7cde926c7f63ceccbe492ef8af5a0da
673
671
2021-02-14T14:26:04Z
Mjh
2
wikitext
text/x-wiki
= To do next downtime =
* beegfs upgrade
* check IB firmware on dstorage nodes, see below
* get three chassis6 nodes working
== Notes ==
=== To update using lifecycle controller ===
* run chromium-browser, unblock popups
* log in to nodexxx.management
* select 'launch virtual console'
* select lifecycle controller from boot menu and reboot (console controls -- ctrl-alt-del -- apply)
* wait for boot to lifecycle controller
* click Ok to all options and wait for networking, ignore ipv6 warning
* click 'get the latest firmware'
* select 'ftp server'
* say 'ftp.dell.com' NOT 'downloads.dell.com'
* wait...
* click 'apply'
* wait...
* the machine may power off, power it on again (you may need to restart virtual console)
* eventually the machine will reboot to the lifecycle controller again. Now click 'exit' at top right.
* close the virtual console when a Linux console prompt is showing.
=== To update IB firmware ===
<pre>
/soft/etc/mft-4.15.1-9-x86_64-rpm/install.sh
mst start
flint -d /dev/mst/mt4123_pciconf0 q
cp /soft/etc/fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin /root
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_26_4300-0Y1T43_07TKND_Ax-UEFI-14.19.17-FlexBoot-3.5.805.signed.bin b
mlxfwreset -d /dev/mst/mt4123_pciconf0 reset
</pre>
where device is whatever appears in /dev/mst and firmware should be downloaded from https://www.mellanox.com/support/firmware/dell
== IB problem list ==
Returned using iblinkinfo on a new OFED install. Doesn't work with standard one.
<pre>
40 3[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 252 1[ ] "dstorage5 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 392 1[ ] "metadata mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 36 1[ ] "dstorage1 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 160 1[ ] "lofar-server mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 10[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 4 1[ ] "uhhpc mlx4_0" ( Could be 10.0 Gbps)
40 13[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 432 1[ ] "smp4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 14[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 224 1[ ] "dstorage4 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 1[ ] "dstorage3 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
40 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 216 1[ ] "dstorage2 mlx4_0" ( Could be FDR10 (Found link at QDR but expected speed is FDR10))
41 3[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 176 1[ ] "dstorage11 mlx4_0" ( Could be 14.0625 Gbps)
41 4[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 368 1[ ] "dstorage9 mlx4_0" ( Could be 14.0625 Gbps)
41 24[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 396 1[ ] "dstorage7 mlx4_0" ( Could be 14.0625 Gbps)
41 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 384 1[ ] "dstorage8 mlx4_0" ( Could be 14.0625 Gbps)
41 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 228 1[ ] "dstorage6 mlx4_0" ( Could be 14.0625 Gbps)
353 27[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 580 1[ ] "node025 mlx4_0" ( Could be 14.0625 Gbps)
1 8[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 748 1[ ] "node103 mlx5_0" ( Could be 14.0625 Gbps)
1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 716 1[ ] "node112 mlx5_0" ( Could be 14.0625 Gbps)
</pre>
ef740562099dd1eadb9e3ef7d853e532841590de
MPI
0
12
646
595
2020-09-30T14:47:57Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster (see full list below). All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All MPI libraries on the cluster know about, and will use, your current allocation of nodes when you have run a [[jobs|job]. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
All MPI implementations are provided using the [[modules|module]] system and so largely the way to use them is interchangeable between different implementations.
At the time of writing (Sep 2020) we recommend using OpenMPI-4.0.5 as it appears to provide the fastest use of the Infiniband network that link the nodes. Some MPI implementations do not use Infiniband at all and these should be avoided if possible.
== Compiling ==
Make sure the correct module is loaded:
<pre>
module load openmpi-4.0.5
</pre>
Now compile your MPI code as normal (e.g. <tt>mpicc</tt>, <tt>mpif77</tt>.
== Running in a job ==
Your job control system script should call the correct version <tt>mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpi-demo
#PBS -m abe
#PBS -l nodes=2:ppn=32
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
eval `/usr/bin/modulecmd bash load openmpi-4.0.5`
mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key lines are (1) the selection of the right MPI library and (2) the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=2:ppn=32 set, as in the example above, there will be 2*32 MPI processes (32 per node).
ae672c48e41450044793b6c988e62be1ee781702
647
646
2020-09-30T15:06:39Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster (see full list below). All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All MPI libraries on the cluster know about, and will use, your current allocation of nodes when you have run a [[jobs|job]. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
All MPI implementations are provided using the [[modules|module]] system and so largely the way to use them is interchangeable between different implementations.
At the time of writing (Sep 2020) we recommend using OpenMPI-4.0.5 as it appears to provide the fastest use of the Infiniband network that link the nodes. Some MPI implementations do not use Infiniband at all and these should be avoided if possible.
== Compiling ==
Make sure the correct module is loaded:
<pre>
module load openmpi-4.0.5
</pre>
Now compile your MPI code as normal (e.g. <tt>mpicc</tt>, <tt>mpif77</tt>.
== Running in a job ==
Your job control system script should call the correct version <tt>mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpi-demo
#PBS -m abe
#PBS -l nodes=2:ppn=32
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
eval `/usr/bin/modulecmd bash load openmpi-4.0.5`
mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key lines are (1) the selection of the right MPI library and (2) the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=2:ppn=32 set, as in the example above, there will be 2*32 MPI processes (32 per node).
== List of MPI implementations ==
The available implementations are listed in the table below.
{|
!Module name
!Name
!MPI version
!Infiniband?
|-
|openmpi-4.0.5||OpenMPI 4.0.5||4||Y
|-
|mvapich2-2.3||MVAPICH2 2.3||2||Y
|-
|intel-mpi||Intel MPI Library||3||Y
|-
|mpich2-local||MPICH2||2||N
|-
|mpi/mpich-3.0-x86_64||MPICH 3.0||3||N
|-
|mpi/mpich-3.2-x86_64||MPICH 3.2||3||N
|-
|}
In general you should select an implementation which implements the latest version of MPI, unless your code is too old to use that, and you should always select an Infiniband-aware implementation if you plan to run on multiple nodes.
5c9ccabaf23af630c8c59246ad2d94f1880e0864
648
647
2020-09-30T15:07:04Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster (see full list below). All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All MPI libraries on the cluster know about, and will use, your current allocation of nodes when you have run a [[jobs|job]. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
All MPI implementations are provided using the [[modules|module]] system and so largely the way to use them is interchangeable between different implementations.
At the time of writing (Sep 2020) we recommend using OpenMPI-4.0.5 as it appears to provide the fastest use of the Infiniband network that link the nodes. Some MPI implementations do not use Infiniband at all and these should be avoided if possible.
== Compiling ==
Make sure the correct module is loaded:
<pre>
module load openmpi-4.0.5
</pre>
Now compile your MPI code as normal (e.g. <tt>mpicc</tt>, <tt>mpif77</tt>).
== Running in a job ==
Your job control system script should call the correct version <tt>mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpi-demo
#PBS -m abe
#PBS -l nodes=2:ppn=32
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
eval `/usr/bin/modulecmd bash load openmpi-4.0.5`
mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key lines are (1) the selection of the right MPI library and (2) the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=2:ppn=32 set, as in the example above, there will be 2*32 MPI processes (32 per node).
== List of MPI implementations ==
The available implementations are listed in the table below.
{|
!Module name
!Name
!MPI version
!Infiniband?
|-
|openmpi-4.0.5||OpenMPI 4.0.5||4||Y
|-
|mvapich2-2.3||MVAPICH2 2.3||2||Y
|-
|intel-mpi||Intel MPI Library||3||Y
|-
|mpich2-local||MPICH2||2||N
|-
|mpi/mpich-3.0-x86_64||MPICH 3.0||3||N
|-
|mpi/mpich-3.2-x86_64||MPICH 3.2||3||N
|-
|}
In general you should select an implementation which implements the latest version of MPI, unless your code is too old to use that, and you should always select an Infiniband-aware implementation if you plan to run on multiple nodes.
608846c886fa326a4acd450fa3ebab74ee65ef61
649
648
2020-09-30T15:11:33Z
Mjh
2
/* List of MPI implementations */
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster (see full list below). All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All MPI libraries on the cluster know about, and will use, your current allocation of nodes when you have run a [[jobs|job]. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
All MPI implementations are provided using the [[modules|module]] system and so largely the way to use them is interchangeable between different implementations.
At the time of writing (Sep 2020) we recommend using OpenMPI-4.0.5 as it appears to provide the fastest use of the Infiniband network that link the nodes. Some MPI implementations do not use Infiniband at all and these should be avoided if possible.
== Compiling ==
Make sure the correct module is loaded:
<pre>
module load openmpi-4.0.5
</pre>
Now compile your MPI code as normal (e.g. <tt>mpicc</tt>, <tt>mpif77</tt>).
== Running in a job ==
Your job control system script should call the correct version <tt>mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpi-demo
#PBS -m abe
#PBS -l nodes=2:ppn=32
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
eval `/usr/bin/modulecmd bash load openmpi-4.0.5`
mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key lines are (1) the selection of the right MPI library and (2) the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=2:ppn=32 set, as in the example above, there will be 2*32 MPI processes (32 per node).
== List of MPI implementations ==
The available implementations are listed in the table below.
{|
!Module name
!Name
!MPI version
!Infiniband?
|-
|openmpi-4.0.5||OpenMPI 4.0.5||3||Y
|-
|mvapich2-2.3||MVAPICH2 2.3||2||Y
|-
|intel-mpi||Intel MPI Library||3||Y
|-
|mpich2-local||MPICH2||2||N
|-
|mpi/mpich-3.0-x86_64||MPICH 3.0||2||N
|
|mpi/mpich-3.2-x86_64||MPICH 3.2||2||N
|-
|}
In general you should select an implementation which implements the latest version of MPI, unless your code is too old to use that, and you should always select an Infiniband-aware implementation if you plan to run on multiple nodes.
3270db6a013f3dca5b7f3bb72268f9f3c7158199
650
649
2020-09-30T15:11:54Z
Mjh
2
/* List of MPI implementations */
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster (see full list below). All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All MPI libraries on the cluster know about, and will use, your current allocation of nodes when you have run a [[jobs|job]. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
All MPI implementations are provided using the [[modules|module]] system and so largely the way to use them is interchangeable between different implementations.
At the time of writing (Sep 2020) we recommend using OpenMPI-4.0.5 as it appears to provide the fastest use of the Infiniband network that link the nodes. Some MPI implementations do not use Infiniband at all and these should be avoided if possible.
== Compiling ==
Make sure the correct module is loaded:
<pre>
module load openmpi-4.0.5
</pre>
Now compile your MPI code as normal (e.g. <tt>mpicc</tt>, <tt>mpif77</tt>).
== Running in a job ==
Your job control system script should call the correct version <tt>mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpi-demo
#PBS -m abe
#PBS -l nodes=2:ppn=32
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
eval `/usr/bin/modulecmd bash load openmpi-4.0.5`
mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key lines are (1) the selection of the right MPI library and (2) the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=2:ppn=32 set, as in the example above, there will be 2*32 MPI processes (32 per node).
== List of MPI implementations ==
The available implementations are listed in the table below.
{|
!Module name
!Name
!MPI version
!Infiniband?
|-
|openmpi-4.0.5||OpenMPI 4.0.5||3||Y
|-
|mvapich2-2.3||MVAPICH2 2.3||2||Y
|-
|intel-mpi||Intel MPI Library||3||Y
|-
|mpich2-local||MPICH2||2||N
|-
|mpi/mpich-3.0-x86_64||MPICH 3.0||2||N
|-
|mpi/mpich-3.2-x86_64||MPICH 3.2||2||N
|-
|}
In general you should select an implementation which implements the latest version of MPI, unless your code is too old to use that, and you should always select an Infiniband-aware implementation if you plan to run on multiple nodes.
2c1ef2177b7b49e62b1fa138133a70e67f267ddd
651
650
2020-09-30T15:13:42Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster (see full list below). All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All MPI libraries on the cluster know about, and will use, your current allocation of nodes when you have run a [[jobs|job]]. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
All MPI implementations are provided using the [[modules|module]] system and so largely the way to use them is interchangeable between different implementations.
At the time of writing (Sep 2020) we recommend using OpenMPI-4.0.5 as it appears to provide the fastest use of the Infiniband network that link the nodes. Some MPI implementations do not use Infiniband at all and these should be avoided if possible.
== Compiling ==
Make sure the correct module is loaded:
<pre>
module load openmpi-4.0.5
</pre>
Now compile your MPI code as normal (e.g. <tt>mpicc</tt>, <tt>mpif77</tt>).
== Running in a job ==
Your job control system script should call the correct version <tt>mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpi-demo
#PBS -m abe
#PBS -l nodes=2:ppn=32
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
eval `/usr/bin/modulecmd bash load openmpi-4.0.5`
mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes.
Note that in this example the '''only''' key lines are (1) the selection of the right MPI library and (2) the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=2:ppn=32 set, as in the example above, there will be 2*32 MPI processes (32 per node).
== List of MPI implementations ==
The available implementations are listed in the table below.
{|
!Module name
!Name
!MPI version
!Infiniband?
|-
|openmpi-4.0.5||OpenMPI 4.0.5||3||Y
|-
|mvapich2-2.3||MVAPICH2 2.3||2||Y
|-
|intel-mpi||Intel MPI Library||3||Y
|-
|mpich2-local||MPICH2||2||N
|-
|mpi/mpich-3.0-x86_64||MPICH 3.0||2||N
|-
|mpi/mpich-3.2-x86_64||MPICH 3.2||2||N
|-
|}
In general you should select an implementation which implements the latest version of MPI, unless your code is too old to use that, and you should always select an Infiniband-aware implementation if you plan to run on multiple nodes.
dec4b1ee65dbcc41b139803afe3b2d323590315e
652
651
2020-09-30T15:14:33Z
Mjh
2
/* Running in a job */
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster (see full list below). All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All MPI libraries on the cluster know about, and will use, your current allocation of nodes when you have run a [[jobs|job]]. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
All MPI implementations are provided using the [[modules|module]] system and so largely the way to use them is interchangeable between different implementations.
At the time of writing (Sep 2020) we recommend using OpenMPI-4.0.5 as it appears to provide the fastest use of the Infiniband network that link the nodes. Some MPI implementations do not use Infiniband at all and these should be avoided if possible.
== Compiling ==
Make sure the correct module is loaded:
<pre>
module load openmpi-4.0.5
</pre>
Now compile your MPI code as normal (e.g. <tt>mpicc</tt>, <tt>mpif77</tt>).
== Running in a job ==
Your job control system script should call the correct version <tt>mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpi-demo
#PBS -m abe
#PBS -l nodes=2:ppn=32
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
eval `/usr/bin/modulecmd bash load openmpi-4.0.5`
mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes. So you will see that you don't have to specify anything else other than the executable to run on the mpiexec line -- mpiexec will do the right thing.
Note that in this example the '''only''' key lines are (1) the selection of the right MPI library and (2) the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=2:ppn=32 set, as in the example above, there will be 2*32 MPI processes (32 per node).
== List of MPI implementations ==
The available implementations are listed in the table below.
{|
!Module name
!Name
!MPI version
!Infiniband?
|-
|openmpi-4.0.5||OpenMPI 4.0.5||3||Y
|-
|mvapich2-2.3||MVAPICH2 2.3||2||Y
|-
|intel-mpi||Intel MPI Library||3||Y
|-
|mpich2-local||MPICH2||2||N
|-
|mpi/mpich-3.0-x86_64||MPICH 3.0||2||N
|-
|mpi/mpich-3.2-x86_64||MPICH 3.2||2||N
|-
|}
In general you should select an implementation which implements the latest version of MPI, unless your code is too old to use that, and you should always select an Infiniband-aware implementation if you plan to run on multiple nodes.
2b5ac1f6ca7758431c361d352cc2adc420da363f
672
652
2021-02-14T09:36:50Z
Mjh
2
wikitext
text/x-wiki
== What is MPI? ==
MPI stands for ''Message Passing Interface''. It provides a standard for communication between processes either running on the same host or over a network. Programs written using this standard can communicate with each other without needing to know details of the way in which they are connected. For a more detailed description see the [http://en.wikipedia.org/wiki/Message_Passing_Interface wikipedia page].
The MPI standard (with extensions in some cases) has been implemented by many different groups.
MPI tutorials are widely available on the web.
== MPI on the cluster ==
There are several implementations of MPI on the cluster (see full list below). All of them provide compilation tools (mpicc, mpif77), libraries, and a means of running code. All MPI libraries on the cluster know about, and will use, your current allocation of nodes when you have run a [[jobs|job]]. You should read the page on [[jobs]] before trying to do anything with any of the example scripts given here!
All MPI implementations are provided using the [[modules|module]] system and so largely the way to use them is interchangeable between different implementations.
At the time of writing (Sep 2020) we recommend using OpenMPI-4.0.5 as it appears to provide the fastest use of the Infiniband network that link the nodes. Some MPI implementations do not use Infiniband at all and these should be avoided if possible.
== Compiling ==
Make sure the correct module is loaded:
<pre>
module load openmpi-4.0.5
</pre>
Now compile your MPI code as normal (e.g. <tt>mpicc</tt>, <tt>mpif77</tt>).
== Running in a job ==
Your job control system script should call the correct version <tt>mpiexec</tt>:
<pre>
#!/bin/sh -f
#PBS -N mpi-demo
#PBS -m abe
#PBS -l nodes=2:ppn=32
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
eval `/usr/bin/modulecmd bash load openmpi-4.0.5`
mpiexec /home/myusername/mympijob
echo ------------------------------------------------------
echo Job ends
</pre>
<tt>mpiexec</tt> uses the Torque environment variables to start processes up on the appropriate nodes. So you will see that you don't have to specify anything else other than the executable to run on the mpiexec line -- mpiexec will do the right thing.
Note that in this example the '''only''' key lines are (1) the selection of the right MPI library and (2) the one calling <tt>mpiexec</tt>. The others are Torque directives or are there for debugging/demonstration purposes.
By default, <tt>mpiexec</tt> will run as many processes as there are processors available to it: so if you have nodes=2:ppn=32 set, as in the example above, there will be 2*32 MPI processes (32 per node).
== Warning ==
You may see warning messages as follows:
<pre>
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: node137
Local device: mlx4_0
--------------------------------------------------------------------------
...
[node137:15044] 31 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[node137:15044] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
</pre>
These messages arise because in this version of OpenMPI there are two ways of trying to initialize the infinband cards and they conflict. The messages are harmless but can be suppressed by using:
<tt>mpiexec --mca btl '^openib' ...</tt>
== List of MPI implementations ==
The available implementations are listed in the table below.
{|
!Module name
!Name
!MPI version
!Infiniband?
|-
|openmpi-4.0.5||OpenMPI 4.0.5||3||Y
|-
|mvapich2-2.3||MVAPICH2 2.3||2||Y
|-
|intel-mpi||Intel MPI Library||3||Y
|-
|mpich2-local||MPICH2||2||N
|-
|mpi/mpich-3.0-x86_64||MPICH 3.0||2||N
|-
|mpi/mpich-3.2-x86_64||MPICH 3.2||2||N
|-
|}
In general you should select an implementation which implements the latest version of MPI, unless your code is too old to use that, and you should always select an Infiniband-aware implementation if you plan to run on multiple nodes.
026df85578754503ee62340aca0748e1aa8da9b0
GPUs
0
71
655
610
2020-09-30T18:15:43Z
Mjh
2
wikitext
text/x-wiki
Several machines on the cluster have attached NVIDIA GPUs.
* gpu1: The attached GPUs are 6 Tesla K80 units with 16GB VRAM.
* gpu2 and gpu3: These both have 3 Tesla V100 units with 16 GB VRAM each on gpu3 and 32 GB VRAM on gpu2.
* ramius has a single Tesla K40c.
ramius is a private machine, the other machines are accessible through the gpu queue.
The NVIDIA CUDA software is installed in '/soft/cuda-VERSION' where VERSION is the version number of CUDA. You can do e.g. <tt>module load cuda-10.0</tt> to make sure the libraries and compilers are on your path.
Note:
* At the moment, there is no provision for preventing contention between users of the Tesla unit. That is, if more than one user runs a job that tries to use it at one time, there may be contention and we don't know what effect this will have.
* Equally, there is no provision for preventing contention on the host side -- if you only ask for a few CPU cores and another CPU-bound job takes the remainder, this may (or may not) interfere with your job.
The easiest way to overcome any issues of contention would be to ask for all cores of the host machine, even if you're not going to use them all: then the job submission system will not allow any other jobs to run on the host, so you have exclusive access to the GPU.
The current arrangement in principle allows non-GPU jobs running on a machine with a GPU to block CUDA jobs. If this becomes a problem, we will review the queueing arrangements.
== Tensorflow ==
Tensorflow packages are available under python3.6. Make sure CUDA 10.0 is loaded (see above) and <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt> is on your PYTHONPATH.
== Via OpenGL context ==
It is possible for an application to access an OpenGL context on the GPUs using the provided 'headless' X server configuration.
* User needs to start X server:
<pre>
X :42 &
</pre>
where "42" is a free display number, assuming "42" is free. If a number is chosen that is not free, an error message will appear.
* Set the DISPLAY environment variable:
<pre>
export DISPLAY=:42.0
</pre>
where "42" is the free display number as used for starting the X server, assuming the syntax of the BASH shell and .0 denotes the first screen (there are currently four - but using more than one is untested).
* start application, which will need to request an OpenGL context, make use of it, output results unattended and quit after it is done (a render job for example).
Submission of such jobs through the job queue is also untested, so needs to be discussed with the administrators first.
02ad84907983f9fb007c31d582c388120778496d
663
655
2021-01-14T18:28:33Z
Mjh
2
/* Tensorflow */
wikitext
text/x-wiki
Several machines on the cluster have attached NVIDIA GPUs.
* gpu1: The attached GPUs are 6 Tesla K80 units with 16GB VRAM.
* gpu2 and gpu3: These both have 3 Tesla V100 units with 16 GB VRAM each on gpu3 and 32 GB VRAM on gpu2.
* ramius has a single Tesla K40c.
ramius is a private machine, the other machines are accessible through the gpu queue.
The NVIDIA CUDA software is installed in '/soft/cuda-VERSION' where VERSION is the version number of CUDA. You can do e.g. <tt>module load cuda-10.0</tt> to make sure the libraries and compilers are on your path.
Note:
* At the moment, there is no provision for preventing contention between users of the Tesla unit. That is, if more than one user runs a job that tries to use it at one time, there may be contention and we don't know what effect this will have.
* Equally, there is no provision for preventing contention on the host side -- if you only ask for a few CPU cores and another CPU-bound job takes the remainder, this may (or may not) interfere with your job.
The easiest way to overcome any issues of contention would be to ask for all cores of the host machine, even if you're not going to use them all: then the job submission system will not allow any other jobs to run on the host, so you have exclusive access to the GPU.
The current arrangement in principle allows non-GPU jobs running on a machine with a GPU to block CUDA jobs. If this becomes a problem, we will review the queueing arrangements.
== Tensorflow ==
Tensorflow packages are available under python3.6. Make sure CUDA 10.0 is loaded (see above) and <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt> is on your PYTHONPATH e.g. by doing <tt>module load python3 cuda-10.0</tt>. If you are running on a GPU machine you will then get GPU acceleration in Tensorflow.
== Via OpenGL context ==
It is possible for an application to access an OpenGL context on the GPUs using the provided 'headless' X server configuration.
* User needs to start X server:
<pre>
X :42 &
</pre>
where "42" is a free display number, assuming "42" is free. If a number is chosen that is not free, an error message will appear.
* Set the DISPLAY environment variable:
<pre>
export DISPLAY=:42.0
</pre>
where "42" is the free display number as used for starting the X server, assuming the syntax of the BASH shell and .0 denotes the first screen (there are currently four - but using more than one is untested).
* start application, which will need to request an OpenGL context, make use of it, output results unattended and quit after it is done (a render job for example).
Submission of such jobs through the job queue is also untested, so needs to be discussed with the administrators first.
eb3c5cb10d040c1a8c32e1dabe206f17f48838e5
Accounts
0
3
659
551
2020-10-13T16:10:38Z
Mayaahorton
17
wikitext
text/x-wiki
Accounts are available to all staff and research students of UH, and to others by special arrangement.
Access is granted subject to the [[Terms of use]] of the cluster and to observance of our usage [[policies]]. Accounts may be suspended or cancelled if the cluster is abused: see also our [[Account cancellation policy]].
To get an account, contact the [[administrators]]. When doing so, please provide a valid email address and indicate that you have read the [[terms of use]] and [[policies]]. Your login details will be emailed to you. If you do not specify a preferred username (typically a combination of your first and last name, initials, or the first part of your email address), one will be assigned to you. Your username is visible to others. If you are at UH, please let us know which department you belong to. External users should specify which group they are working with, such as WEAVE or LOFAR.
eb6b31396778c8e022e8266906406a6bd5452f83
674
659
2021-04-22T17:33:21Z
Mjh
2
wikitext
text/x-wiki
Accounts are available to all staff and research students of UH, and to others by special arrangement.
Access is granted subject to the [[Terms of use]] of the cluster and to observance of our usage [[policies]]. Accounts may be suspended or cancelled if the cluster is abused: see also our [[Account cancellation policy]].
To get an account, contact the [[administrators]]. When doing so, please provide a valid email address and indicate that you have read the [[terms of use]] and [[policies]]. Your login details will be emailed to you. If you do not specify a preferred username (typically your UH username if you have one; optionally, a combination of your first and last name, initials, or the first part of your email address), one will be assigned to you. Your username is visible to others. If you are at UH, please let us know which department you belong to. External users should specify which group they are working with, such as WEAVE or LOFAR.
956d2823e174adcac1a3d7594ef344739ce7e356
Administrators
0
6
660
601
2020-10-13T16:15:48Z
Mayaahorton
17
/* Administrators */
wikitext
text/x-wiki
== Administrators ==
UH users wanting help or support with UHHPC should ask for it through the UH helpdesk (e-mail helpdesk@herts.ac.uk or log in to [https://helpdesk.herts.ac.uk/ the web interface]). Please make sure that your e-mail or help request includes 'UHHPC' in the subject line to make sure that it is routed through to the correct team. Failure to do so can result in very long delays. For account requests, please provide your contact details and department or working group as specified on the [[accounts]] page.
External users, such as external consortium users, should send e-mail to Martin Hardcastle (m.j.hardcastle@herts.ac.uk).
692a1607727e9c8ce17246063adf40f1daf75ee9
Storage
0
8
661
611
2020-11-13T15:49:43Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage.
There is 1.1 PB of beegfs storage nominally distributed as follows:
* 485 TB: general use, under /beegfs/general
* 272 TB: CAR, under /beegfs/car
* 480 TB: LOFAR-UK, under /beegfs/lofar
* 90 TB: CAIR, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
5ac4464e3d3250c01a35128c25bcd71b3ffc0168
Jobs
0
9
662
625
2020-11-17T11:15:49Z
Mjh
2
/* Running code */
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://docs.adaptivecomputing.com/torque/help.htm Torque] (formerly known as PBS) and [http://docs.adaptivecomputing.com/maui/index.php Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Basic commands ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/qsub.htm online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
* <tt>file</tt>: the [[local disk space]] to be used by your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=64 job.sh ## run for two hours on 64 CPUs (which may be spread over up to 64 physical nodes)
qsub -l walltime=1:0:0 -l nodes=2:ppn=32 job.sh ## run for one hour on 2 nodes with 32 CPUs per node
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes on one node for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. Do not attempt to use these options unless you know what you are doing.
Note also that in the present configuration requests for fewer processors per node than are available may be consolidated onto fewer nodes. This should never be a problem unless you are specifically testing inter-node communication, in which case you could specify node names to force running on different nodes. For efficiency, you are recommended always to ask for and use the maximum number of processes per node for MPI or multi-threaded code -- that is, 32 processes for the main cluster. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
You can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qalter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>. Look at the <tt>man</tt> pages for these commands for more information.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/2-jobs/exportedBatchEnvVar.htm here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/pbsdsh.htm pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
export OMP_NUM_THREADS=`cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
For a large array you have the option to limit the number of jobs that will run concurrently -- perhaps because they all want access to IO resources and will compete with each other and run out of walltime if they all run at once. So
<pre>
qsub -t 1-1000%20 myjob.qsub
</pre>
will run 1000 versions of the job but ensure that only 20 are running at a given time.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
a0fffdc451ec90ea23284adc5f16c2394cc93e42
675
662
2021-04-28T15:20:14Z
Mjh
2
wikitext
text/x-wiki
The main mechanism for carrying out computing on the cluster is the '''batch queue system'''. This is based around [http://docs.adaptivecomputing.com/torque/help.htm Torque] (formerly known as PBS) and [http://docs.adaptivecomputing.com/maui/index.php Maui], which are free software and widely used in the HPC world, particularly in academia. Torque is the queueing system, which handles job submission, and Maui is the scheduler, which decides what jobs are run.
Rather than running processes directly on the compute nodes, you submit a script which, in general, contains both directives to the queuing system and the commands needed to run your jobs. The scheduler will then decide, based on the available resources, whether your job can be run immediately or must be deferred till later. If other people's jobs are running, you are likely to have to wait. It is important therefore to request only the minimum resources you need: a job that only needs 8 nodes may be able to run immediately while a job that requests all nodes in the main cluster may need to wait for days.
Once your job starts running the standard output and standard error can be saved or discarded, at your option. Most jobs will operate on files in one of the cluster-wide file systems (see [[storage]]). You can also arrange to have the system send you e-mail when your job starts and/or ends.
You should enable [[passwordless ssh]] between nodes before starting to use the job submission system. Various parts of it and code that depends on it rely on being able to use ssh or scp.
== Submission of jobs ==
The command to submit a job to the batch queue system is <tt>qsub</tt>.
<tt>qsub</tt> takes one obligatory argument, the name of the script that will be run to start the job:
<pre>
qsub myjob.sh
</pre>
This must be a script: specifying a binary file will not work.
Optionally, qsub can take a number of arguments that specify what resources the job requires and how it is to be handled by the queuing system. Alternatively, these can be specified in the script, tagged with lines beginning <tt>#PBS</tt> before any executable lines. For example,
<pre>
cat << END > myjob1.sh
#!/bin/sh
echo "Hello world"
END
qsub -N hello -m abe myjob1.sh
</pre>
and
<pre>
cat <<END > myjob2.sh
#!/bin/sh
#PBS -N hello
#PBS -m abe
echo "Hello world"
END
qsub myjob2.sh
</pre>
are functionally equivalent. It is often more useful to have the options specified in the script. You can also have a mixture: if the same option is specified on the <tt>qsub</tt> command line and in a script directive, the command line takes precedence.
A full list of qsub options is given by <tt>man qsub</tt> or is available [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/qsub.htm online]. Some important ones are as follows:
* -N: specify the name of the job as it will appear in the queue
* -m: say when the system should e-mail you: <tt>-m abe</tt> means to e-mail on abort, begin and end. Before setting this flag, read the page on [[Mail]] and take appropriate action!
* -k: specify whether the standard output or standard error streams of the job should be mirrored in your home directory. If you specify this, you will be able to see the output as it's generated, but it will appear in your home directory. If you don't, the output will be stored in the directory specified by -o and -e or in the current working directory of qsub if they are not specified, but it will only appear once the job is finished.
* -j: specify whether the output and error streams should be kept separate or merged.
* -o, -e: specify where standard output/error should be stored, if not in the current directory
* -q: specify what [[queues|queue]] the job will be run on.
* -t: start multiple jobs simultaneously (see below)
* -W: for job inter-dependencies (see below)
* -v: to set environment variables (see below)
The most important option is <tt>-l</tt>, which specifies the resources to be used by the job. On our system the useful resources to specify are
* <tt>walltime</tt>: the expected execution time in h:m:s format (if unspecified, the default walltime on all queues is 24 hours).
* <tt>nodes</tt>: a list of requirements on the nodes to be allocated, or possibly more than one joined by a plus sign.
* <tt>pmem</tt>: the physical [[memory]] requirements for your job.
* <tt>file</tt>: the [[local disk space]] to be used by your job.
For example,
<pre>
qsub -l walltime=2:0:0 -l nodes=64 job.sh ## run for two hours on 64 CPUs (which may be spread over up to 64 physical nodes)
qsub -l walltime=1:0:0 -l nodes=2:ppn=32 job.sh ## run for one hour on 2 nodes with 32 CPUs per node
qsub -l walltime=0:1:0 -l nodes=1:2 -l pmem=8gb job5.sh ## Run two processes on one node for one minute, requesting 8 Gb of memory each
</pre>
(In general you will not want to specify node or [[architecture|chassis]] names directly, as this will interfere with the efficiency of the scheduling. Do not attempt to use these options unless you know what you are doing.
Note also that in the present configuration requests for fewer processors per node than are available may be consolidated onto fewer nodes. This should never be a problem unless you are specifically testing inter-node communication, in which case you could specify node names to force running on different nodes. For efficiency, you are recommended always to ask for and use the maximum number of processes per node for MPI or multi-threaded code -- that is, 32 processes for the main cluster. If you are running single-threaded code, you should ask for one processor per node.
The defaults for all the other options are sensible but you ''must'' think carefully about the resources you need. Requesting more time or resources than you need will lead to inefficiency in scheduling your jobs: requesting less than you need (particularly in terms of walltime) is likely to mean that your job does not run to completion. A job that exceeds the walltime estimate will be terminated. ''A walltime should be specified: the default walltime (see [[Queues]]) is probably not what you want.''
== Viewing jobs in the queue ==
Once a job is submitted, you can view its progress with <tt>qstat</tt>:
<pre>
>qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
1765.stri-cluster HCG62-doublebeta mjh 00:00:00 R all
1766.stri-cluster ...59-doublebeta mjh 00:00:00 R all
1767.stri-cluster ...07-doublebeta mjh 00:00:00 R all
1768.stri-cluster ...83-doublebeta mjh 00:00:00 R all
1769.stri-cluster ...73-doublebeta mjh 00:00:00 R all
1770.stri-cluster ...61-doublebeta mjh 0 Q all
1771.stri-cluster ...36-doublebeta mjh 0 Q all
1772.stri-cluster ...44-doublebeta mjh 0 Q all
1773.stri-cluster ...29-doublebeta mjh 0 Q all
1774.stri-cluster ...33-doublebeta mjh 0 Q all
1775.stri-cluster ...46-doublebeta mjh 0 Q all
>qstat -a
stri-cluster.herts.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1765.stri-cluster.he mjh all HCG62-doub 11125 16 -- -- 05:00 R 01:56
1766.stri-cluster.he mjh all IC1459-dou 10658 16 -- -- 05:00 R 01:56
1767.stri-cluster.he mjh all NGC3607-do 7355 16 -- -- 05:00 R 01:56
1768.stri-cluster.he mjh all NGC383-dou 12069 16 -- -- 05:00 R 01:56
1769.stri-cluster.he mjh all NGC4073-do 12793 16 -- -- 05:00 R 01:56
1770.stri-cluster.he mjh all NGC4261-do -- 16 -- -- 05:00 Q --
1771.stri-cluster.he mjh all NGC4636-do -- 16 -- -- 05:00 Q --
1772.stri-cluster.he mjh all NGC5044-do -- 16 -- -- 05:00 Q --
1773.stri-cluster.he mjh all NGC5129-do -- 16 -- -- 05:00 Q --
1774.stri-cluster.he mjh all NGC533-dou -- 16 -- -- 05:00 Q --
1775.stri-cluster.he mjh all NGC5846-do -- 16 -- -- 05:00 Q --
</pre>
The different options to qstat give different views of the queue. Here we see that some jobs are running (R) and some are queued (Q). You may also see jobs that have recently completed (C), jobs that are held pending some condition (H) and, hopefully rarely, jobs where there is some error in the scheduling system (E).
== Changing and deleting jobs ==
You can delete jobs from the queue with the Torque command <tt>qdel</tt>, you can change your resource request with <tt>qalter</tt>, and you can move queued jobs between different queues with <tt>qmove</tt>. Look at the <tt>man</tt> pages for these commands for more information.
== Running code ==
Your job will start running on ''one'' of the nodes it has been allocated. It is your script's responsibility to start your code running on any other nodes you have been allocated.
At run time, environment variables contain information about the job. A full list is [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/2-jobs/exportedBatchEnvVar.htm here]. The crucial one is $PBS_NODEFILE, which provides a list of nodes on which the job should run. ''You must honour this node list.''
The [http://docs.adaptivecomputing.com/torque/2-5-12/help.htm#topics/commands/pbsdsh.htm pbsdsh] command can be used in your script to run a single program on all the allocated nodes, as in the following example:
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
pbsdsh hostname
echo ------------------------------------------------------
echo Job ends
</pre>
pbsdsh would be suitable for starting a number of parallel processes which do not need to communicate, e.g. multiple identical Monte Carlo simulations. For scripts to start [[MPI]] processes in such a way that inter-process communication (IPC) works see the description of [[MPI]].
If you are running [[OpenMP]] code, which should only be running on a single node, then you need to make sure that the code does not grab all available CPUs. An easy way to make sure that you take only the CPUs that you have been allocated would be to put
<pre>
export OMP_NUM_THREADS=`cat $PBS_NODEFILE | wc -l`
</pre>
in the qsub script before the code runs.
== Environment ==
In all cases your script must set the environment that you want your program to run in. Environment variables, aliases and working directory are not inherited from the shell that runs qsub. So you will often want to change to an appropriate working directory and to source some startup files in your qsub script before you actually execute any code.
<pre>
#!/bin/sh -f
#PBS -N pbsdsh
#PBS -m abe
#PBS -l nodes=16:ppn=8
#PBS -l walltime=00:01:00
#PBS -k oe
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------
cd /home/fred/my_working_directory
export PATH=/home/fred/my_binaries:${PATH}
/usr/local/bin/mpiexec my-mpi-code arg1 arg2
echo ------------------------------------------------------
echo Job ends
</pre>
Note that you can set environment variables in the environment of the running script with the <tt>-v</tt> option to <tt>qsub</tt>:
<pre>
qsub -v NAME=fred myjob.sh
</pre>
The value of <tt>$NAME</tt> is then available to the script. This is useful if you want to write a generic script and control it using a parameter, rather than editing the script.
== Multiple job submission ==
The <tt>-t</tt> option to qsub allows you to create multiple jobs simultaneously. These will all run the same script but will differ in the value of the environment variable <tt>$PBS_ARRAYID</tt>. Your script should normally use this variable to decide how to behave differently. For example, multiple runs of a Monte Carlo simulation might use it to store the output in different files. The jobs will all be submitted to the queue immediately and may run concurrently, depending on the resources they request, so it is important to make sure that it is safe to do this.
A common mistake is to use the -t option wrongly.
<pre>
qsub -t 4 myjob.qsub
</pre>
will only start ''one'' job, with $PBS_ARRAYID set to 4.
<pre>
qsub -t 1-4 myjob.qsub
</pre>
will queue four jobs, with $PBS_ARRAYID ranging from 1 to 4.
For a large array you have the option to limit the number of jobs that will run concurrently -- perhaps because they all want access to IO resources and will compete with each other and run out of walltime if they all run at once. So
<pre>
qsub -t 1-1000%20 myjob.qsub
</pre>
will run 1000 versions of the job but ensure that only 20 are running at a given time.
== Interactive jobs ==
If you need to access nodes interactively, see the separate page on [[interactive jobs]].
== Jobs that depend on other jobs ==
You may wish to submit jobs that depend on each other: for example, to submit a job that only runs after another job has finished. (This allows you to stack up very long computing requests without violating the maximum walltime constraints.)
Look at <tt>man qsub</tt> for the full details, but for a simple dependence use
<pre>
qsub -W depend=afterany:123456 myjob.qsub
</pre>
which will cause your job to be run only after job 123456 has finished (no matter what its output status).
28d044a865991d8cebe5bff176e8de3101fa9846
Storage
0
8
676
661
2021-05-01T08:50:24Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 334 Tb of scratch for CAR users only, mounted as /car-data
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/ralf : Ralf Napiwotzki
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 1.9 PB of beegfs storage nominally distributed as follows:
* 497 TB: general use, under /beegfs/general
* 553 TB: CAR, under /beegfs/car
* 421 TB: LOFAR-UK, under /beegfs/lofar
* 298 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
c46f4b1207436bc713a009f217c74757b528fa08
686
676
2021-10-01T12:52:50Z
Mjh
2
/* System-wide NFS storage */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 1.9 PB of beegfs storage nominally distributed as follows:
* 497 TB: general use, under /beegfs/general
* 553 TB: CAR, under /beegfs/car
* 421 TB: LOFAR-UK, under /beegfs/lofar
* 298 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
39133210cab102c9f35274fb7c2aa56867aebfbe
688
686
2021-11-04T08:33:34Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 1.9 PB of beegfs storage nominally distributed as follows:
* 497 TB: general use, under /beegfs/general
* 853 TB: CAR, under /beegfs/car
* 421 TB: LOFAR-UK, under /beegfs/lofar
* 298 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
710edcfc5724b978bfb49f6366b8af3e954e922f
689
688
2021-11-05T12:31:31Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 2.2 PB of beegfs storage nominally distributed as follows:
* 486 TB: general use, under /beegfs/general
* 831 TB: CAR, under /beegfs/car
* 500 TB: LOFAR-UK, under /beegfs/lofar
* 380 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
a0f0ec9bf91391dc63c7774f2c6807429e422700
695
689
2022-02-14T16:49:56Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 2.2 PB of beegfs storage nominally distributed as follows:
* 486 TB: general use, under /beegfs/general
* 1122 TB: CAR, under /beegfs/car
* 500 TB: LOFAR-UK, under /beegfs/lofar
* 671 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
1de52a560c803c76d821c79a3bd01cea7add99a1
696
695
2022-02-14T16:50:33Z
Mjh
2
/* Distributed file system */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 3 Tb of user home directories, mounted as /home
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 2.8 PB of beegfs storage nominally distributed as follows:
* 486 TB: general use, under /beegfs/general
* 1122 TB: CAR, under /beegfs/car
* 500 TB: LOFAR-UK, under /beegfs/lofar
* 671 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home is backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file we ''may'' have a useful backup (ask straight away).
e932cd1b569724339dcd852863d3ba99dfa083d3
704
696
2022-06-26T08:44:28Z
Mjh
2
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use /home for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username (unless you are a member of a special group with dedicated storage, see below) to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 6.5 Tb of user home directories, mounted as /home and /home2
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 3.8 PB of beegfs storage nominally distributed as follows:
* 486 TB: general use, under /beegfs/general
* 1122 TB: CAR, under /beegfs/car
* 500 TB: LOFAR-UK, under /beegfs/lofar
* 671 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home and /home2 are backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file from your home directory we ''may'' have a useful backup (ask straight away).
bcaf61bdb03b3d716a43e58c7aa32d6a4647479b
705
704
2022-06-27T07:49:59Z
Mjh
2
/* Overview */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use their home directories (on /home and /home2) for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username (unless you are a member of a special group with dedicated storage, see below) to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 6.5 Tb of user home directories, mounted as /home and /home2
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 3.8 PB of beegfs storage nominally distributed as follows:
* 486 TB: general use, under /beegfs/general
* 1122 TB: CAR, under /beegfs/car
* 500 TB: LOFAR-UK, under /beegfs/lofar
* 671 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
The four [[SMP machines]] have their own local scratch discs: see the relevant page for more information. These are mounted system-wide as /smp1 .. /smp4
Nodes also have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home and /home2 are backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file from your home directory we ''may'' have a useful backup (ask straight away).
aa468efddc5f0691a5643e757d3b4cc3154b7a92
707
705
2022-07-21T19:16:47Z
Mjh
2
/* Disks local to machines */
wikitext
text/x-wiki
The '''storage''' available on the cluster comes in three flavours. Please use all cluster storage space responsibly: when you no longer need it, delete the contents or transfer them elsewhere so that others can use the space. Large datasets will be being processed on the cluster. (See also [[policies]].)
== Overview ==
Most cluster users will use their home directories (on /home and /home2) for small amounts of important data (no more than 50 GB) and /beegfs/general for large amounts of data. Your home directory is created automatically for you, but you will need to make a directory /beegfs/general/your_username (unless you are a member of a special group with dedicated storage, see below) to work on the /beegfs volume. For details of all the different areas available, read more below.
== System-wide NFS storage ==
This is shared across all machines and served by single servers in each case. It can be slow if many users are trying to access it simultaneously. We are phasing out the use of this kind of storage for large volumes.
Currently general-user NFS volumes are:
* 6.5 Tb of user home directories, mounted as /home and /home2
* Software directory /soft
* 58 Tb of scratch for CAIR users only, mounted as /cair-scratch
* 77 Tb of scratch for CAIR users only, mounted as /cair-work
In addition there are some special NFS volumes, which are mounted under /data and are only available to particular users or groups:
* /data/lofar : for LOFAR-UK users
* /data/jim : Jim Geach
* /data/astroml : Machine learning group
The /home disc is for files that are important and require long-term storage (source code, qsub scripts, etc). It should not be used for general data or for large quantities of output from jobs, for example. Large files should be stored on the relevant data disk (if you want to work relative to /home, use a symbolic link). A [[quota]] system is in place on /home.
== Distributed file system ==
The cluster uses the [https://www.beegfs.io/content/ Beegfs] file system for distributed storage, mounted at /beegfs . Files stored here are distributed over a number of servers in a way that's transparent to the user, who sees one big file system. Because there are multiple servers, /beegfs scales much better under load than the NFS storage as long as it is not full.
There is 3.8 PB of beegfs storage nominally distributed as follows:
* 486 TB: general use, under /beegfs/general
* 1122 TB: CAR, under /beegfs/car
* 500 TB: LOFAR-UK, under /beegfs/lofar
* 671 TB: CACP, under /beegfs/cair
Use of the relevant subdirectories means that you think you have permission to use those allocations. In practice there is no difference in where or how the data are stored.
== Disks local to machines ==
Nodes have a small amount of [[local disk space]] accessible to jobs (but not mounted on the head node), and [[ramdisks]] are also available.
== Backups ==
No data area on the cluster is currently backed up. You must take responsibility for your own backups.
/home and /home2 are backed up by nightly rsync to another location on the cluster: that means if you delete a crucial file from your home directory we ''may'' have a useful backup (ask straight away).
3b074c4e02fa5a6edb759b9acba4ab4a7a88f27a
Reservations
0
46
677
326
2021-05-27T11:16:19Z
Mjh
2
wikitext
text/x-wiki
It is possible to request reservations of one or more nodes at fixed times in the future. If you have reserved a node, no other jobs than yours will be able to run on it.
You can request a reservation from the [[administrators]] or, if you have a particular need to do this often, we can set things up so that you can create your own reservations automatically. Please only ask for a reservation if you believe that there is no method of doing what you want within the standard [[Jobs|queueing system]].
Reservations will usually be for a group of people, but may be for an individual. If you need to use a group reservation, you will need to know the name of the group in question, and you will need to belong to that group. Typing <tt>groups</tt> at a shell prompt on the head node will tell you what groups you belong to.
General guidelines for reservations are as follows:
* If creating a reservation yourself, reserve the machine(s) for a period when no existing jobs will be running -- use <tt>qstat</tt> to determine when this will be.
* If you are using a personal reservation, use the reservation by submitting a job as normal. A reservation available to you will be used automatically if it is available. If you need an interactive session, use an [[interactive jobs|interactive job]]; e.g. if you have reserved smp2 for two days, do <tt>qsub -I -q smp -l nodes=smp2:ppn=48 -l walltime=48:00:00</tt>.
* If you are using a group reservation, specify that you want to use it by adding the option <tt>-W group_list=[groupname]</tt> to the <tt>qsub</tt> command or script. E.g. to use 8 cores of the <tt>scuba2</tt> group reservation on smp1 interactively, do <tt>qsub -W group_list=scuba2 -q smp -l nodes=smp1:ppn=8 -I</tt>. Again, the reservation will be used if the resources are available, and your job will otherwise go into the general pool.
* If you no longer need a reservation, e-mail the administrators to ask them to delete it.
d20156d8eb08a8ed1c15f214cb0ccbd642cba767
Cluster bibliography
0
30
678
608
2021-06-02T10:54:53Z
Asinha
12
Add Sinha et al. 2021 to list
wikitext
text/x-wiki
Please add details of any papers written using the cluster here. This allows us to keep track of the productivity of the cluster. You can use any format you like so long as you identify the authors, the title (this helps us keep track of what different areas of research are being carried out with the cluster), and an indication of the bibliographic details and date.
* Ankur Sinha, Christoph Metzner, Neil Davey, Roderick Adams, Michael Schmuker, and Volker Steuber. Growth rules for the repair of asynchronous irregular neuronal networks after peripheral lesions. '''PLOS Computational Biology''', 17(6):1–35, '''2021'''. URL: https://doi.org/10.1371/journal.pcbi.1008996, doi:10.1371/journal.pcbi.1008996.
* Patel, H., & Kukol, A. ('''2019'''). Prediction of ligands to universally conserved binding sites of the influenza A virus nuclear export protein. ''Virology'', 537, 97-103.
* Taylor, P., Kobayashi, C., Federrath, C. The metallicity and elemental abundance maps of kinematically atypical galaxies for constraining minor merger and accretion histories. '''2019'''. MNRAS, 485, 3215
* Wang, E.X., Taylor, P., Federrath, C., Kobayashi, C. The impact of black hole seeding in cosmological simulations. '''2019'''. MNRAS, 483, 4640
* Taylor, P., Federrath, C., Kobayashi, C. The origin of kinematically distinct cores and misaligned gas discs in galaxies from cosmological simulations. '''2018'''. MNRAS, 479, 141
* Taylor, P., Kobayashi, C. The metallicity and elemental abundance gradients of simulated galaxies and their environmental dependence. '''2017'''. MNRAS, 471, 3856
* Taylor, P., Federrath, C., Kobayashi, C. Star formation in simulated galaxies: understanding the transition to quiescence at 3x10^10 solar masses. '''2017'''. MNRAS, 469, 4249
*Taylor, P., Kobayashi, C. Time evolution of galaxy scaling relations in cosmological simulations. '''2016'''. MNRAS, 463, 2465
*Taylor, P., Kobayashi, C. Quantifying AGN-driven metal-enhanced outflows in chemodynamical simulations. '''2015'''. MNRAS, 452L, 59
*Taylor, P., Kobayashi, C. The effects of AGN feedback on present-day galaxy properties in cosmological simulations. '''2015'''. MNRAS, 448, 1835
*Taylor, P., Kobayashi, C. Seeding black holes in cosmological simulations. '''2014'''. MNRAS, 442, 2751
* Stotz, H., Pascoe, H., Parham, H., Fitt, B., Mashanova, A., Kukol, A., ... & Hossein, B. ('''2018'''). Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae. ''PLoS ONE'', 13(06), 1-17. [e0198201]. DOI: 10.1371/journal.pone.0198201
* Patel, H., & Kukol, A. ('''2017'''). Evolutionary conservation of influenza A PB2 sequences reveals potential target sites for small molecule inhibitors. ''Virology'', 509, 112-120. DOI: 10.1016/j.virol.2017.06.009
* Patel, H., & Kukol, A. ('''2016'''). Evaluation of a novel virtual screening strategy using receptor decoy binding sites. ''J Negative Results in BioMedicine'', 15(15).
* Kukol A, Hughes DJ ('''2014'''). Large-scale analysis of Influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites, ''Virology'', 454/55: 40-47.
* Poojari, C, Kukol, A, Strodel, B ('''2013'''). How the amyloid-β peptide and membranes affect each other: An extensive simulation study, ''Biochimica et Biophysica Acta - Biomembranes'', 1828(2), 327-339.
* Steuernagel O., Kakofengitis D., Ritter G., Wigner flow reveals topological order in quantum phase space dynamics, '''2012''', accepted by ''PRL''.
* Hardcastle MJ, Krause MGH, Numerical modelling of the lobes of radio galaxies in cluster environments, '''2012''', submitted to ''MNRAS''
* Kalia M, Kukol A ('''2011'''.) Structure and dynamics of the kinase IKK-β – a key regulator of the NF-kappa B transcription factor, ''J Struct Biol'', 176(2), 133-142.
* Kukol A ('''2011'''). Consensus virtual screening approaches to predict protein ligands, ''Eur J Med Chem'', 46(9), 4661-4664.
* Hardcastle MJ, Croston JH, Modelling TeV gamma-ray emission from the kiloparsec-scale jets of Centaurus A and M87, '''2011''', ''MNRAS'' 415 433
* Goodger JL, Croston JH, Hardcastle MJ, The influence of radio-loud AGN on groups of galaxies: Paper I - the X-ray luminosity-temperature relationship, submitted to ''MNRAS'', May 2011.
* de Sousa G, Maex R, Adams R, Davey N, Steuber V, Optimization of neuronal morphologies for pattern recognition, published at ''BMC Neuroscience'' '''2010'''
* Karen Safaryan, Reinoud Maex, Rod Adams, Neil Davey, and Volker Steuber The Effect of Non-Specific LTD on Pattern Recognition in Cerebellar Purkinje Cells, published at ''BMC Neuroscience'' '''11''', P92
69d126c666d194541631c628dabfdeb83c9d0fdc
Administrators
0
6
679
660
2021-07-13T09:05:57Z
Mayaahorton
17
/* Administrators */
wikitext
text/x-wiki
== Administrators ==
UH users wanting help or support with UHHPC should ask for it through the UH helpdesk (e-mail helpdesk@herts.ac.uk or log in to [https://helpdesk.herts.ac.uk/ the web interface]). Please make sure that your e-mail or help request includes 'UHHPC' in the subject line to make sure that it is routed through to the dedicated cluster support team. Failure to do so can result in long delays. For account requests, please provide your contact details and department or working group as specified on the [[accounts]] page.
External users, such as external consortium users, should send e-mail to Martin Hardcastle (m.j.hardcastle@herts.ac.uk).
3d73dca4162988faceea34760541a971e318ce7c
699
679
2022-05-06T10:27:54Z
Mayaahorton
17
/* Administrators */
wikitext
text/x-wiki
== Administrators ==
UH users wanting help or support with UHHPC should ask for it through the UH helpdesk (e-mail helpdesk@herts.ac.uk or log in to [https://helpdesk.herts.ac.uk/ the web interface]). Please make sure that your e-mail or help request includes 'UHHPC' in the subject line to make sure that it is routed through to the dedicated cluster support team. Failure to do so can result in long delays. For account requests, please provide your contact details and department or working group as specified on the [[accounts]] page. Sending technical support requests to individual staff members is unlikely to get a response.
Helpdesk access is restricted to those who have current UH login details. External users, such as external consortium users, should ask their UH collaborators to submit helpdesk requests where possible. If this is not possible you can send e-mail directly to Martin Hardcastle (m.j.hardcastle@herts.ac.uk).
816ffd9be9eacf0738f883732a4993ca4fc7fed2
700
699
2022-05-06T10:28:13Z
Mayaahorton
17
/* Administrators */
wikitext
text/x-wiki
== Administrators ==
UH users wanting help or support with UHHPC should ask for it through the UH helpdesk (e-mail helpdesk@herts.ac.uk or log in to [https://helpdesk.herts.ac.uk/ the web interface]). Please make sure that your e-mail or help request includes 'UHHPC' in the subject line to make sure that it is routed through to the dedicated cluster support team. Failure to do so can result in long delays. For account requests, please provide your contact details and department or working group as specified on the [[accounts]] page. Sending technical support requests to individual staff members is unlikely to get a response.
Helpdesk access is unfortunately restricted to those who have current UH login details. External users, such as external consortium users, should ask their UH collaborators to submit helpdesk requests where possible. If this is not possible you can send e-mail directly to Martin Hardcastle (m.j.hardcastle@herts.ac.uk).
2db67aa2ab8651d5f01826c1ebeead48d848e37e
Read this first
0
70
680
514
2021-07-15T16:02:35Z
Mayaahorton
17
/* Introduction to cluster computing */
wikitext
text/x-wiki
= Introduction to cluster computing =
If you are new to the concept of cluster computing, read this '''before doing anything else'''.
Cluster computing is fundamentally different from personal computing. If you do not understand how it works, you may cause problems for other users, or break [[Policies|cluster rules]].
The cluster is composed of 'nodes' which are individual computers, joined together by a network. Nodes have different roles. Specifically there are a few 'login nodes' or 'head nodes' (what you see when you log in to the cluster), and many (about 150) [[Architecture|compute nodes]], which are where the actual computing work is done. When you first log in, you will end up in your home directory and will have the ability to create and store folders as well as upload, move and download data using standard Linux commands or a repository such as Github. However, it is very important that you do not run scripts or large data processing tasks in this initial environment because you may slow down the whole cluster. Rather, you write a script to run a '[[Jobs|job]]' on the compute nodes (or alternatively submit an '[[Interactive job|Interactive_jobs]]'). Your script tells the cluster both what you want to do, and what resources you need to do it. Your job is then run on one or more compute nodes that match your requirements. Any code or task that would take more than a minute or two to execute needs to be submitted to the job scheduler. The [[administrators]] can, and often have to, terminate jobs that are running on the head nodes.
If your code is capable of doing so, this model allows you to access many nodes simultaneously, and so to get much greater resources than your desktop or laptop computer can provide. But you need to understand it to use it.
New users should read '''at least''' the following Wiki pages:
* [[Accounts]] -- to find out how to get an account
* [[Access]] -- to find out how to get access to the cluster
* [[Architecture]] -- to find out what nodes there are
* [[Jobs]] -- to find out how to run jobs on appropriate compute nodes
* [[Queues]] -- to understand which queue to use
* [[Storage]] -- to understand how and where to store data on the cluster
Please don't approach the [[administrators]] for help until you have read and understood these pages.
a55a807b66be0db805d94b4178cf4c254c1714ad
681
680
2021-07-15T16:03:10Z
Mayaahorton
17
/* Introduction to cluster computing */
wikitext
text/x-wiki
= Introduction to cluster computing =
If you are new to the concept of cluster computing, read this '''before doing anything else'''.
Cluster computing is fundamentally different from personal computing. If you do not understand how it works, you may cause problems for other users, or break [[Policies|cluster rules]].
The cluster is composed of 'nodes' which are individual computers, joined together by a network. Nodes have different roles. Specifically there are a few 'login nodes' or 'head nodes' (what you see when you log in to the cluster), and many (about 150) [[Architecture|compute nodes]], which are where the actual computing work is done. When you first log in, you will end up in your home directory and will have the ability to create and store folders as well as upload, move and download data using standard Linux commands or a repository such as Github. However, it is very important that you do not run scripts or large data processing tasks in this initial environment because you may slow down the whole cluster. Rather, you write a script to run a '[[Jobs|job]]' on the compute nodes (or alternatively submit an '[[Interactive _jobs|interactive job]]'). Your script tells the cluster both what you want to do, and what resources you need to do it. Your job is then run on one or more compute nodes that match your requirements. Any code or task that would take more than a minute or two to execute needs to be submitted to the job scheduler. The [[administrators]] can, and often have to, terminate jobs that are running on the head nodes.
If your code is capable of doing so, this model allows you to access many nodes simultaneously, and so to get much greater resources than your desktop or laptop computer can provide. But you need to understand it to use it.
New users should read '''at least''' the following Wiki pages:
* [[Accounts]] -- to find out how to get an account
* [[Access]] -- to find out how to get access to the cluster
* [[Architecture]] -- to find out what nodes there are
* [[Jobs]] -- to find out how to run jobs on appropriate compute nodes
* [[Queues]] -- to understand which queue to use
* [[Storage]] -- to understand how and where to store data on the cluster
Please don't approach the [[administrators]] for help until you have read and understood these pages.
4c578d4ca116bc13bed17f51b663953fdb9de279
682
681
2021-07-15T16:05:17Z
Mayaahorton
17
/* Introduction to cluster computing */
wikitext
text/x-wiki
= Introduction to cluster computing =
If you are new to the concept of cluster computing, read this '''before doing anything else'''.
Cluster computing is fundamentally different from personal computing. If you do not understand how it works, you may cause problems for other users, or break [[Policies|cluster rules]].
The cluster is composed of 'nodes' which are individual computers, joined together by a network. Nodes have different roles. Specifically there are a few 'login nodes' or 'head nodes' (what you see when you log in to the cluster), and many (about 150) [[Architecture|compute nodes]], which are where the actual computing work is done. When you first log in, you will end up in your home directory and will have the ability to create and store folders as well as upload, move and download data using standard Linux commands or connect to a repository such as Github. However, it is very important that you do not run scripts or large data processing tasks in this initial environment because you may slow down the whole cluster. Rather, you write a script to run a '[[Jobs|job]]' on the compute nodes (or alternatively submit an '[[Interactive _jobs|interactive job]]'). Your script tells the cluster both what you want to do, and what resources you need to do it. Your job is then run on one or more compute nodes that match your requirements. Any code or task that would take more than a minute or two to execute needs to be submitted to the job scheduler. The [[administrators]] can, and often have to, terminate jobs that are running on the head nodes.
If your code is capable of doing so, this model allows you to access many nodes simultaneously, and so to get much greater resources than your desktop or laptop computer can provide. But you need to understand it to use it.
New users should read '''at least''' the following Wiki pages:
* [[Accounts]] -- to find out how to get an account
* [[Access]] -- to find out how to get access to the cluster
* [[Architecture]] -- to find out what nodes there are
* [[Jobs]] -- to find out how to run jobs on appropriate compute nodes
* [[Queues]] -- to understand which queue to use
* [[Storage]] -- to understand how and where to store data on the cluster
Please don't approach the [[administrators]] for help until you have read and understood these pages.
fb28f2979fda4455df4358d260abbf44dcb8a3b9
683
682
2021-07-15T16:06:35Z
Mayaahorton
17
/* Introduction to cluster computing */
wikitext
text/x-wiki
= Introduction to cluster computing =
If you are new to the concept of cluster computing, read this '''before doing anything else'''.
Cluster computing is fundamentally different from personal computing. If you do not understand how it works, you may cause problems for yourself or other users, or break [[Policies|cluster rules]].
The cluster is composed of 'nodes' which are individual computers, joined together by a network. Nodes have different roles. Specifically there are a few 'login nodes' or 'head nodes' (what you see when you log in to the cluster), and many (about 150) [[Architecture|compute nodes]], which are where the actual computing work is done. When you first log in, you will end up in your home directory and will have the ability to create and store folders as well as upload, move and download data using standard Linux commands or connect to a repository such as Github. However, it is very important that you do not run scripts or large data processing tasks in this initial environment because you may slow down the whole cluster. Rather, you write a script to run a '[[Jobs|job]]' on the compute nodes (or alternatively submit an '[[Interactive _jobs|interactive job]]'). Your script tells the cluster both what you want to do, and what resources you need to do it. Your job is then run on one or more compute nodes that match your requirements. Any code or task that would take more than a minute or two to execute needs to be submitted to the job scheduler. The [[administrators]] can, and often have to, terminate jobs that are running on the head nodes.
If your code is capable of doing so, this model allows you to access many nodes simultaneously, and so to get much greater resources than your desktop or laptop computer can provide. But you need to understand it to use it.
New users should read '''at least''' the following Wiki pages:
* [[Accounts]] -- to find out how to get an account
* [[Access]] -- to find out how to get access to the cluster
* [[Architecture]] -- to find out what nodes there are
* [[Jobs]] -- to find out how to run jobs on appropriate compute nodes
* [[Queues]] -- to understand which queue to use
* [[Storage]] -- to understand how and where to store data on the cluster
Please don't approach the [[administrators]] for help until you have read and understood these pages.
088b4057878d092657895e4cd800a2961d02231f
Python packages
0
49
684
658
2021-08-10T20:33:26Z
Mjh
2
/* Python virtual environments */
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* numpy
* scipy
* astropy
* tensorflow
* h5py
* mpi4py
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these and many more available.
You can install your own local copies of packages that don't exist on the system using <tt>pip install --user PACKAGENAME</tt>. These will appear in your <tt>.local</tt> directory.
However, please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to include <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>, and make sure no Python 2 packages are on your PYTHONPATH.
If you want Ipython3 add <tt>/soft/python3/usr/local/bin</tt> to your PATH.
<tt>module load python3</tt> will make these changes for you.
<tt>pip3</tt> can be used to install local copies of python3 packages.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 and bash assumed -- for tcsh use unsetenv not unset):
<pre>
unset PYTHONPATH
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
This creates a directory py3-venv which will be used for your own private installation of packages. In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip (without <tt>--user</tt> option).
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-10.1
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
Note that using this method means you will also have to install any other modules you need, such as matplotlib, using pip install.
f7ca739684dc82a4f558aee4433816a0c8d24235
685
684
2021-09-22T13:08:38Z
Mjh
2
wikitext
text/x-wiki
Local python packages installed in <tt>/soft</tt> include
* numpy
* scipy
* astropy
* tensorflow
* h5py
* mpi4py
Set your PYTHONPATH to <tt>/soft/python/lib64/python2.7/site-packages</tt> to make these and many more available.
You can install your own local copies of packages that don't exist on the system using <tt>pip install --user PACKAGENAME</tt>. These will appear in your <tt>.local</tt> directory.
However, please don't install local copies of software that is already globally available! If you need an upgraded version, talk to the system [[administrators]].
== Python 3.6 ==
A separate Python 3.6 installation is available. To use this, just run python36 or python3.6 .
If you want to use system-wide installations of packages you will need to set your PYTHONPATH to include <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt>, and make sure no Python 2 packages are on your PYTHONPATH.
If you want Ipython3 add <tt>/soft/python3/usr/local/bin</tt> to your PATH.
<tt>module load python3</tt> will make these changes for you.
<tt>pip3</tt> can be used to install local copies of python3 packages.
== Python virtual environments ==
Sometimes it is necessary to run Python in a clean environment -- particularly if you want Python3 to be the default Python.
To set this up do the following (Python3 and bash assumed -- for tcsh use unsetenv not unset):
<pre>
unset PYTHONPATH
python3 -m venv py3-venv
source py3-venv/bin/activate
pip3 install --upgrade pip
</pre>
This creates a directory py3-venv which will be used for your own private installation of packages. In future whenever you do
<pre>
source py3-venv/bin/activate
</pre>
your python version will be python3 and you will have the ability to install python3 packages to your py3-venv directory using pip (without <tt>--user</tt> option).
For example, to get set up with Tensorflow with GPU support:
<pre>
source py3-venv/bin/activate
pip install tensorflow
module load cuda-11.4
ipython3
</pre>
If all goes well then you will be able to do
<pre>
import tensorflow as tf
tf.print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
</pre>
Note that using this method means you will also have to install any other modules you need, such as matplotlib, using pip install.
254b2f611df24d9b03490e8bb51a830e2eb2f419
Interactive jobs
0
35
687
499
2021-10-22T17:02:31Z
Mjh
2
/* Multiple CPUs */
wikitext
text/x-wiki
Interactive jobs are jobs which give you an interactive session on one of the compute nodes. Importantly, accessing the compute nodes this way means that the job control system guarantees the resources that you have asked for. If you simply log into a compute node with ssh (which is in any case forbidden by the cluster access [[policies]]), another user's job may compete with what you are trying to do. Therefore you should, unless explicitly authorized otherwise, always use the interactive job facility to run interactively on the compute nodes.
== Running an interactive job ==
An interactive job is run using the <tt>qsub</tt> command as for normal [[jobs]]. However, the result of running an interactive job is that you are logged on to the target machine. For example,
<pre>
[user@headnode1 ~]$ qsub -l walltime=00:30:00 -I -q main
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@node047 ~]$
</pre>
In this example the user has requested a 30-minute session on any node in the main cluster (the default request is one processor on one node). A node is available to run the job and so the user finds herself at a shell prompt. She may now use the node interactively for 30 minutes. At the end of the 30 minutes (wall time, not CPU time!) the job will end and the session will be terminated. If the user logs out before then, the job will terminate early.
Note that, just as with ordinary jobs, you must honour the allocation you are given. If you ask for one CPU on one node, you must only use that. In the interactive shell, a number of environment variables beginning with PBS (try <tt>printenv | grep PBS</tt>) are provided to tell you the job environment in which you are running in case you have forgotten.
If your request for an interactive shell cannot be fulfilled (because there are insufficient nodes available in the queue that you have specified), the qsub command will wait until it can be.
== Advanced topics ==
=== Multiple CPUs ===
If you know you want a machine completely dedicated to you (e.g. because you plan to run multithreaded code interactively) then you must explicitly request that: e.g.,
<pre>
qsub -l walltime=24:00:00 -l nodes=1:ppn=32 -I -q smp
</pre>
will reserve all 32 cores of one of the [[SMP machines]] for you for a day.
=== Multiple nodes ===
In the unlikely situation in which you want to run interactively on several nodes at once, you will only be given one shell. You can use the <tt>pbsdsh</tt> command to run the same commands on each of the processors you have been allocated, or /usr/local/bin/mpiexec to run [[MPI]] jobs.
<pre>
qsub -l walltime=00:01:00 -l nodes=2:ppn=2 -I -q smp
qsub: waiting for job 123456.stri-cluster.herts.ac.uk to start
qsub: job 123456.stri-cluster.herts.ac.uk ready
[user@smp2 ~]$ pbsdsh hostname
smp2
smp1
smp1
smp2
</pre>
=== Specific machines ===
It is possible to request a specific machine just as for normal non-interactive [[jobs]]:
<pre>
qsub -l walltime=00:01:00 -l nodes=smp2:ppn=48 -I -q smp
</pre>
Do this if you know that you need to use some particular feature of the machine, such as the [[SMP machines]]' scratch discs.
=== X forwarding ===
If you want to run jobs that use the X windows system, you may turn on X11 forwarding using the <tt>-X</tt> option. (This is like the <tt>-X</tt> option to <tt>ssh</tt>.)
=== Walltime requests ===
Please be considerate in your walltime requests. While you have an interactive job running, no other user can use the resources you have allocated. Therefore, you should not set an interactive job running for a week and walk away. Jobs that appear to be reserving resources without using them will be terminated. Obviously the best strategy is to select a walltime that roughly matches what you want and exit before the time is up.
7997715999e267fbb32d964f429a08c55b9fa434
Software
0
17
690
642
2021-12-03T16:25:03Z
Mayaahorton
17
/* Astronomical software */
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2018b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[SAS]]: in <tt>/soft/xmmsas_xxxxxx</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
* [[DS9]]: in /soft/bin/ds9
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* [[NAMD]]: in <tt>/soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA</tt>
* [[Gromacs]]: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
e4dd59107acbaa374090ed32a496af520ba9ad9c
715
690
2023-04-29T09:49:32Z
Mjh
2
/* Programming languages and development environments */
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt>, <tt>gcc 10.4</tt> or <tt>gcc 13.1</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2022b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[SAS]]: in <tt>/soft/xmmsas_xxxxxx</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
* [[DS9]]: in /soft/bin/ds9
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* [[NAMD]]: in <tt>/soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA</tt>
* [[Gromacs]]: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt>
b992d675fce320a7e33ed297cecb1b0ad65db259
716
715
2023-04-29T09:50:08Z
Mjh
2
/* Containerization */
wikitext
text/x-wiki
This page documents the software installed on the cluster and its location.
Detailed local documentation of software, if available, goes in a page specific to that software linked from this list. Users are encouraged to update the wiki with descriptions of the software they use.
Contact the [[administrators]] if you need an upgraded version of any of these. If you need any freely available software package that runs on Linux, in general we can install it. We do not have resources to pay for licences for commercial software, though, so if you need a commercial software package to be installed you will also need to provide funding for a licence.
= Programming languages and development environments =
* GNU C, C++ and Fortran: available by default, <tt>module load gcc-6.4</tt>, <tt>gcc 10.4</tt> or <tt>gcc 13.1</tt> for newer versions
* Intel C and Fortran: <tt>module load intel</tt>
* Python 2 and 3 and many [[Python packages]]
* [[Matlab]]: in <tt>/soft/MATLAB/R2022b/bin/matlab</tt>
* [[IDL]]: in <tt>/soft/idl/idl/bin</tt>
* [[R]]: installed by default or <tt>/soft/R</tt>
* [[Julia]]: <tt>module load julia</tt>
* [[GPUs|CUDA]]: <tt>module load cuda...</tt>
= Astronomical software =
* Astropy: see [[Python packages]]
* [[AIPS]]: 31DEC18 installed in <tt>/soft/aips</tt>
* [[CASA]]: installed in <tt>/soft/casa...</tt>
* [[Starlink]]: in <tt>/soft/star</tt> or <tt>/soft/stardev</tt>
* [[LOFAR]]: in <tt>/soft/lofar-xxxxxx</tt>, see page for details
* [[Miriad]]: in <tt> /soft/miriad</tt>
* [[ciao]]: in <tt>/soft/ciao-x.x</tt>
* [[SAS]]: in <tt>/soft/xmmsas_xxxxxx</tt>
* [[PLUTO]]: see page for documentation
* [[aoflagger]]: in <tt>/soft/aoflagger</tt>
* [[wsclean]]: in <tt>/soft/wsclean</tt>
* [[Brats]]: in <tt>/soft/brats</tt>
* [[Topcat]] and Stilts: in <tt>/soft/topcat</tt>
* [[DS9]]: in /soft/bin/ds9
= Engineering =
* [[Ansys Fluent]]: in <tt>/soft/ansys_inc</tt>
* [[Star-CCM+]]: in <tt>/soft/STAR-CCM+14.02.010</tt>
* [[Cantera]]: in <tt>/soft/cantera</tt>
* [[Converge]]: in <tt>/soft/CONVERGE_Studio/</tt>
= Molecular dynamics =
* [[NAMD]]: in <tt>/soft/NAMD_2.13_Linux-x86_64-ibverbs-smp-CUDA</tt>
* [[Gromacs]]: in <tt>/soft/gromacs-2018-gpu</tt>
* Autodock [[Vina]]: in <tt>/soft/autodock_vina_1_1_1_linux_x86</tt>
* [[Autodock]] : in <tt>/soft/autodock</tt>
* [[iGemDock]]: in <tt>/soft/iGEMDOCKv2.1-centos/</tt>
= Computational neuroscience =
* [[neuron]]: in <tt>/soft/nrn</tt>
= Optimization =
* [[Gurobi]]: in <tt>/soft/gurobi</tt>
= Containerization =
* [[Singularity]] in <tt>/soft/bin/singularity</tt> or <tt>module load singularity...</tt>
be8620907384a48a792b44aba4ef2a60eb015b87
Main Page
0
1
691
552
2021-12-03T16:28:12Z
Mayaahorton
17
/* How-Tos */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the UH HPC service.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to achieve a particular result on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Getting started ==
* [[Read this first]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Terms of use]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[GPUs]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[Monitoring]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
* [[Galaxy|How to use Galaxy on the cluster]]
== Known problems ==
* [[Known problems]]
20be339cace77f930524a3d204d232d1c44b908b
701
691
2022-05-06T10:29:00Z
Mayaahorton
17
/* Troubleshooting */
wikitext
text/x-wiki
== Welcome to the cluster documentation wiki ==
This wiki is the location for documentation for the UH HPC service.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to achieve a particular result on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Getting started ==
* [[Read this first]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Terms of use]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[GPUs]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[Monitoring]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Start here]]
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
* [[Galaxy|How to use Galaxy on the cluster]]
== Known problems ==
* [[Known problems]]
ce9d54736e4f6434c66a42f82a281977d0f49c4f
725
701
2023-09-26T10:32:52Z
Mjh
2
wikitext
text/x-wiki
== Welcome to the UHHPC documentation wiki ==
This wiki is the location for documentation for the UH HPC service.
If you are a cluster user, feel free to register for an account so that you can describe any method you use to achieve a particular result on the cluster. You must be approved for an account before editing the Wiki. Some local documentation on [[MediaWiki]] is available.
== Getting started ==
* [[Read this first]]
== Cluster basics ==
* [[Accounts]] and [[Account cancellation policy]]
* [[Terms of use]]
* [[Policies]]
* [[Fair share]]
* [[Access]]
* [[Architecture]]
* [[Networking]]
* [[Storage]]; [[quota]] system
* [[Administrators]]' contact details
== Using the cluster ==
* [[Jobs]]
* [[Queues]]
* [[Reservations]]
* [[SMP machines]]
* [[GPUs]]
* [[Modules]]
* [[MPI]]
* [[OpenMP]]
* [[Parallelization|How to parallelize your job]]
* [[Compilers]]
* [[Software]]
* [[Mail]]
* [[Web server]]
* [[Monitoring]]
* [[LOFAR-UK Compute Facility]]
== Troubleshooting ==
* [[Start here]]
* [[Why doesn't my job run?]]
* [[Job errors]]
== Publications ==
* [[Acknowledgements]]
* [[Cluster bibliography]]
== How-Tos ==
* [[Star-CCM+|How to use Star-CCM+ on the cluster]]
* [[Galaxy|How to use Galaxy on the cluster]]
== Known problems ==
* [[Known problems]]
e0de67892d4e7c22fcd6841c09676ae45012d084
Galaxy
0
86
692
2021-12-03T16:28:21Z
Mayaahorton
17
Created page with "Coming soon."
wikitext
text/x-wiki
Coming soon.
4b112f37651048ba2ea49ff06d2785674491b2b3
PLUTO
0
87
693
2021-12-03T17:06:41Z
Mayaahorton
17
Created page with "PLUTO is used for astrophysical fluid dynamics and other applications. It is widely used on the cluster but is sensitive to MPI issues and has different versions. You will nee..."
wikitext
text/x-wiki
PLUTO is used for astrophysical fluid dynamics and other applications. It is widely used on the cluster but is sensitive to MPI issues and has different versions. You will need to download and install the version that best matches your needs. It can be freely downloaded [[http://plutocode.ph.unito.it|from here]] and includes extensive documentation. You may belong to a research group which uses a modified version. Whilst the software can take time to master, you are advised to read the documentation and try out some of the test problems. This is particularly true if you are also new to cluster computing -- many test problems can be run quickly on a laptop, allowing you to become familiar with the setup. Of course, large problems will eventually require the use of the cluster. Many, but not all, PLUTO problems will require three files to run: init.c, definitions.h and pluto.ini. These are typically used to set up grid settings, define variables and store the actual models required for your problem. Once these are set up you will need to generate and run a makefile as outlined in the documentation.
13f430974bfec4b720e6d827bee901b5d411b901
Queues
0
15
694
581
2022-02-14T14:49:34Z
Mayaahorton
17
wikitext
text/x-wiki
There are eight possible job queues available for general use on the system:
* 'main' is the default queue: this submits to the main cluster. The maximum wall time on this queue is 1 week.
* 'gpu' submits to the GPU nodes. The maximum wall time on this queue is 48 hours.
* 'large' submits to 64 core nodes. The maximum wall time on this queue is 1 week, but you may require permission if your job requires a high number of nodes.
* 'test' submits to 96 core nodes. Currently, access is limited to those requiring a high number of CPUs; if you don't already have access please contact a member of the team to discuss your needs. The maximum wall time is 1 week.
* 'smp' submits to the [[SMP machines]]. It is not currently possible to run MPI jobs that span the SMP machines and the main or CAIR clusters. The maximum wall time for this queue is 48 hours.
* 'cair_l' submits to the dedicated CAIR nodes. This queue is restricted to CAIR users.
* 'car' submits to the dedicated CAR nodes. This queue is restricted to CAR users. The maximum wall time for this queue is 1 week.
* 'forecast' submits to the dedicated air quality forecast nodes.
== Default wall times ==
The default wall time for all the queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on all queues is 1.
b1634b00c6ef6f788940a484453fac979aed182f
710
694
2022-08-15T10:27:10Z
Mjh
2
wikitext
text/x-wiki
There are five possible job queues available for general use on the system:
* 'main' is the default queue: this submits to the main cluster. The maximum wall time on this queue is 1 week.
* 'gpu' submits to the GPU nodes. The maximum wall time on this queue is 48 hours.
* 'large' submits to 64 core nodes. The maximum wall time on this queue is 1 week, but you may require permission if your job requires a high number of nodes.
* 'test' submits to 96 core nodes. Currently, access is limited to those requiring a high number of CPUs; if you don't already have access please contact a member of the team to discuss your needs. The maximum wall time is 1 week.
* 'forecast' submits to the dedicated air quality forecast nodes.
== Default wall times ==
The default wall time for all the queues is 24 hours. If you want a job to run for longer, you must explicitly provide a wall time estimate.
== Other defaults ==
The default memory use per process for the main queue is 1 Gb. If you need more memory than this, see the section on [[memory]].
The default number of nodes for a job on all queues is 1.
a5e3f479f4d21608d469b8005b88cadf37e0e02f
Web server
0
32
697
289
2022-03-22T11:16:22Z
Mjh
2
wikitext
text/x-wiki
The web server <tt>http://uhhpc.herts.ac.uk/</tt> is visible inside and outside the university.
If you create a directory <tt>public_html</tt> in your home directory, then its contents are visible at <tt>http://uhhpc.herts.ac.uk/~your-username/</tt>. You can use this to export data; for large datasets, use symbolic links to /beegfs . Do not rely on the long-term existence of this facility (e.g. you should not use the cluster to host your personal home page).
935bc154a2265b637acf893fe8ac9312d1de4bfc
Matlab
0
88
698
2022-05-02T09:27:42Z
Mayaahorton
17
Created page with "At the present time, the current working version of MATLAB on the cluster can be accessed using /soft/MATLAB/2018b/bin/matlab. MATLAB on the cluster works best when scripted b..."
wikitext
text/x-wiki
At the present time, the current working version of MATLAB on the cluster can be accessed using /soft/MATLAB/2018b/bin/matlab. MATLAB on the cluster works best when scripted but if you really need a GUI you can set up -X forwarding, though for everyday computation there is little improvement in using the cluster over desktop PCs on campus. The only exception would be for heavy computation and simulations, which will likely be slowed down considerably by -X forwarding.
If you need a different version for a specific reason please contact us.
ce043684ef176fe961fd4988d34f4464821c2716
Start here
0
89
702
2022-05-06T11:10:01Z
Mayaahorton
17
Created page with "==Overview== High Performance Computing environments are often built on heterogeneous architecture, meaning that no two clusters are alike. Code that runs on one HPC system i..."
wikitext
text/x-wiki
==Overview==
High Performance Computing environments are often built on heterogeneous architecture, meaning that no two clusters are alike. Code that runs on one HPC system is not guaranteed to work on another. Even within the UHHPC, some machines are designed for certain tasks and have software and filesystems mounted that are not available from elsewhere in the cluster. When code doesn't run, it is tempting to think that the cluster is broken. More often, though, there is a coding problem rather than a hardware problem.
The UHHPC admin team manages more than 400 users across approximately 20 different research groups and departments. We work with external partners at dozens of institutions worldwide. The cluster runs thousands of specialised research software, modules and packages. The UHHPC is a shared research facility and users are responsible for their own research and experimental design. Our primary goal is the maintenance and development of the physical infrastructure. We are a very small team and such cannot help optimise experiments or debug code. Except in rare circumstances, we cannot compile or recompile code for you (if you really need this, please talk to us first).
Most errors are not caused by hardware malfunction (although this does happen occasionally, particularly after power outages). It can be very difficult to know whether a problem is caused by a cluster issue which needs to be reported to the helpdesk, or a code issue which you could solve yourself. The following sections are designed to give a quick look at the four main classes of scripting error and some steps you can take to try to resolve them yourself.
(Sections coming soon)
==Job scheduling problems==
UHHPC runs on a PBS-based job scheduler called Torque and Maui. You can submit jobs interactively or through a submission script. A list of common scheduler errors is given below (coming soon), including resource request errors.
==Local environment problems==
Including difficulties with paths, dependencies, shell setup and local installations (coming soon)
==Software problems==
Coming soon
==MPI problems==
An overview of some of the most challenging problems to identify and resolve (coming soon)
f9d51691cb29b41bdcf6851a02a0946f54bd2dff
703
702
2022-05-06T11:39:31Z
Mayaahorton
17
/* Software problems */
wikitext
text/x-wiki
==Overview==
High Performance Computing environments are often built on heterogeneous architecture, meaning that no two clusters are alike. Code that runs on one HPC system is not guaranteed to work on another. Even within the UHHPC, some machines are designed for certain tasks and have software and filesystems mounted that are not available from elsewhere in the cluster. When code doesn't run, it is tempting to think that the cluster is broken. More often, though, there is a coding problem rather than a hardware problem.
The UHHPC admin team manages more than 400 users across approximately 20 different research groups and departments. We work with external partners at dozens of institutions worldwide. The cluster runs thousands of specialised research software, modules and packages. The UHHPC is a shared research facility and users are responsible for their own research and experimental design. Our primary goal is the maintenance and development of the physical infrastructure. We are a very small team and such cannot help optimise experiments or debug code. Except in rare circumstances, we cannot compile or recompile code for you (if you really need this, please talk to us first).
Most errors are not caused by hardware malfunction (although this does happen occasionally, particularly after power outages). It can be very difficult to know whether a problem is caused by a cluster issue which needs to be reported to the helpdesk, or a code issue which you could solve yourself. The following sections are designed to give a quick look at the four main classes of scripting error and some steps you can take to try to resolve them yourself.
(Sections coming soon)
==Job scheduling problems==
UHHPC runs on a PBS-based job scheduler called Torque and Maui. You can submit jobs interactively or through a submission script. A list of common scheduler errors is given below (coming soon), including resource request errors.
==Local environment problems==
Including difficulties with paths, dependencies, shell setup and local installations (coming soon)
==Software problems==
Including issues with compiling. Will also cover what to check when your code is running much more slowly than expected. Coming soon
==MPI problems==
An overview of some of the most challenging problems to identify and resolve (coming soon)
0bb4191536fcdf48c11a2bfb699d8fda36251d0b
Quota
0
38
706
540
2022-07-21T19:14:38Z
Mjh
2
wikitext
text/x-wiki
Use of space on <tt>/home</tt> and <tt>/home2</tt> is subject to a quota system. You have a fixed amount of space that you are allowed to use: if you exceed this, you will no longer be able to create files in your home directory..
The current default quota for all users is 50 Gb. When you reach 49 Gb, you will be warned and given a period (1 week) in which your usage should be reduced below 49 Gb; if you fail to reduce usage in this period, or if your usage reaches 50 Gb, new file creation will be blocked.
The quota is ''not'' an expected reasonable use for a cluster user. You should try to keep your use of the home directory as low as possible.
If you believe you have a specific need to exceed this quota, please contact the cluster [[administrators]].
There is no quota on the various data areas (see [[Storage]]) and these are the locations where it is appropriate to store large volumes of data.
2be2b9e6aeadbb0cabdc20450a8a18d2c2e028ea
Architecture
0
7
708
654
2022-07-22T08:49:25Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 3 head nodes: headnode1, headnode2, headnode3 each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband (node001-080: rack1, rack2, rack3), in the main queue
* 16 AMD EPYC 7452 2 socket x 32-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node097-112: rack4) in the large queue.
* 16 AMD EPYC 7552 2 socket x 48-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node081-096: rack5) in the test queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[GPUs|GPU]] machines, gpu1-4, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A file server, stri-server, which is a 2 socket x 8 core Xeon machine
** 473 TB of [[storage]] attached via Fibre Channel to this server.
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* metadata and dstorage1-12, file servers providing 1.1 PB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
47a9ed2eca574239d7e0b5774b630ee70af7300b
709
708
2022-07-22T08:50:08Z
Mjh
2
/* Servers and dedicated login nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 3 head nodes: headnode1, headnode2, headnode3 each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband (node001-080: rack1, rack2, rack3), in the main queue
* 16 AMD EPYC 7452 2 socket x 32-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node097-112: rack4) in the large queue.
* 16 AMD EPYC 7552 2 socket x 48-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node081-096: rack5) in the test queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[GPUs|GPU]] machines, gpu1-4, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A cair data processing server - [[cair-cluster]], providing logins and storage to CAIR users
** 132 TB of [[storage]] attached via Fibre Channel to this server.
* A separate server -- [[cair-forecast]] -- providing an additional 80 TB of storage and processing for CAIR AQF use.
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* smp4 and smp5, dedicated 32-core and 48-core machines
* metadata and dstorage1-22, file servers providing 3.9 PB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
b4e9d6544662e17cbfef154624e8e7313cb58152
720
709
2023-08-07T09:38:13Z
Mjh
2
/* Servers and dedicated login nodes */
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 3 head nodes: headnode1, headnode2, headnode3 each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband (node001-080: rack1, rack2, rack3), in the main queue
* 16 AMD EPYC 7452 2 socket x 32-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node097-112: rack4) in the large queue.
* 16 AMD EPYC 7552 2 socket x 48-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node081-096: rack5) in the test queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[GPUs|GPU]] machines, gpu1-4, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* smp4 and smp5, dedicated 32-core and 48-core machines
* smp6, dedicated 96-core machine
* metadata and dstorage1-23, file servers providing 4.2 PB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
The image below shows the physical layout of the cluster components as at late 2011. The cluster is no longer in this configuration.
http://stri-cluster.herts.ac.uk/cluster2.jpg
e24b2eb687cdddb07bee12f3e03fe51111f7f512
721
720
2023-08-07T09:38:29Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 3 head nodes: headnode1, headnode2, headnode3 each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband (node001-080: rack1, rack2, rack3), in the main queue
* 16 AMD EPYC 7452 2 socket x 32-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node097-112: rack4) in the large queue.
* 16 AMD EPYC 7552 2 socket x 48-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node081-096: rack5) in the test queue
* 12 Xeons (E5-2660s) 2 socket x 8-core with 64 GB RAM and FDR Infiniband (node129-140: chassis 9), in the main queue
* 4 Xeons (E5-2660s) 2 socket x 8-core with 16 GB RAM and FDR Infiniband are dedicated AQF (CAIR) nodes (node141-144: chassis 9), in the forecast queue
* Four [[GPUs|GPU]] machines, gpu1-4, in the gpu queue
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* smp4 and smp5, dedicated 32-core and 48-core machines
* smp6, dedicated 96-core machine
* metadata and dstorage1-23, file servers providing 4.2 PB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
56d67c07e9487f812b5b7c9df462f3a01729d213
724
721
2023-09-26T10:31:23Z
Mjh
2
wikitext
text/x-wiki
The cluster consists of
== head/login nodes ==
* 3 head nodes: headnode1, headnode2, headnode3 each with 192 GB RAM and 2 x 12-core Xeon processors, for user login and development
* job/queue server, uhhpc, with 12 cores and 32 GB RAM: web, jobs, mail, DNS etc
== compute nodes ==
* 80 Xeons (Gold 6130) 2 socket x 16 core (no hyperthreading) with 192 GB RAM and FDR Infiniband (node001-080: rack1, rack2, rack3), in the main queue
* 16 AMD EPYC 7452 2 socket x 32-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node097-112: rack4) in the large queue.
* 16 AMD EPYC 7552 2 socket x 48-core (no hyperthreading) with 256 GB RAM and FDR Infiniband (node081-096: rack5) in the test queue
* 16 Xeons (E5-2660s) 2 socket x 8-core with varying amounts of RAM and FDR Infiniband (node129-140: chassis 9), in the main queue
* Four [[GPUs|GPU]] machines, gpu1-4, in the gpu queue
* Coming soon, 8 new GPU machines (gpu5-12) with 4xA100 GPUs.
* Coming soon, 8 new 96-core nodes
== GPUs ==
* See the separate page on [[GPUs]]
== Servers and dedicated login nodes ==
* A separate server -- [[lofar-server]] -- providing logins and 100 TB of storage for [[LOFAR-UK Compute Facility|LOFAR-UK]] users
* ramius, a dedicated 16-core machine
* mancuso, a dedicated 20-core machine
* smp4 and smp5, dedicated 32-core and 48-core machines
* smp6-8, 96-core 1-TB RAM systems
* metadata and dstorage1-23, file servers providing 4.2 PB of BeegFS distributed [[storage]]
== Networking ==
* Ethernet and infiniband switches provide connectivity.
0518003e35f0f26a7826a5219fccdc4e28d9bee0
CASA
0
28
711
334
2023-01-05T14:08:16Z
Mayaahorton
17
wikitext
text/x-wiki
CASA is software for radio astronomy data reduction. Various versions are installed on the cluster. The latest version is always in <tt>/soft/casapy</tt> (a symbolic link to the real directory).
To use casa, do <tt>module load casa</tt> and then run it with <tt>casapy</tt> or <tt>casa</tt>.
You should not run CASA on the head node: either run it through the batch job system or use an [[interactive jobs|interactive job]].
062acbb81b53be9ff93f5e9fb0c8052355c2ca46
Singularity
0
79
712
603
2023-01-05T16:52:27Z
Mjh
2
wikitext
text/x-wiki
Singularity ([https://github.com/sylabs/singularity]) is installed on the cluster. This is our preferred way of running containerized applications, including Docker containers.
You need to use the singularity [[modules|module]] (<tt>module load singularity</tt>) to get singularity on your path. You probably want to use the --bind option to bind data directories such as beegfs.
Note that singularity images can't be built on BeeGFS (they can be stored there once built). This will affect users converting from Docker images. If this causes you problems please contact the [[administrators]].
ad1bff97711879de8cff4a4e79556b2bb0fb2087
Fair share
0
39
713
553
2023-02-09T16:09:12Z
Mjh
2
wikitext
text/x-wiki
There are various systems in place to try to make sure that everyone gets a fair share of the cluster (particularly the main cluster) and that a mixture of long and short, large and small jobs can be run.
Jobs are run based on a priority assigned by the scheduler. The priority takes account of a number of factors:
* Jobs that involve more CPUs intrinsically have higher priority. This is to stop single-CPU work pushing out large jobs.
* Jobs that have been in the queue for longer have higher priority, up to a maximum of 64 idle jobs per user (after that, the jobs will wait in the queue but will not accrue priority)
* Jobs from users who have not recently significantly used the cluster have priority over those from users who have. (This is called the fair-share mechanism: the 'fair share amount' is a weighted integral of usage over the last week, weighted towards more recent use.)
In addition, by default,
* no user can have more than 512 processors' worth of jobs running at once. This is intended to stop a single user monopolizing the cluster
* no user can have a processor-time product that exceeds 1 week x 256 cores running at any given time. This is intended to stop large long jobs blocking shorter jobs.
These policies apply to all queues but are tailored to the main queue where the competition is highest. If they cause you problems, please contact the [[administrators]]. We are happy to review policies to try to get the fairest result for everyone, and we can relax the default requirements if you have a particular need for more resources.
0560d02a90354e79bb8f494043b2db43fcc9a4af
Gromacs
0
19
714
558
2023-04-29T09:47:56Z
Mjh
2
wikitext
text/x-wiki
[http://www.gromacs.org Gromacs] is a software for molecular dynamics simulation. It treats molecules as particles in a classical mechanics forcefield.
There are a number of versions of Gromacs on the cluster. Gromacs 2018 and 2023 are the most recent ones.
== '''How to perform a simulation with Gromacs' mdrun:''' ==
1) You need to prepare the binary simulation start file (tpr-file) either on your local LINUX machine or on the headnode of the cluster. If you prepare it on the local machine, make sure you use Gromacs version corresponding to the one you want to use on the cluster
In order to run Gromacs on the headnode for preparation, it is a good idea to put the following into you .cshrc file:
* For 2023 version
source /soft/gromacs-2023.1/bin/GMXRC (either CPU or GPU version)
export LD_LIBRARY_PATH="/soft/gcc-10.4.0/lib64:${LD_LIBRARY_PATH}"
* For 2018 version
source /soft/gromacs-2018/bin/GMXRC (or /soft/gromacs-2018-gpu/bin/GMXRC for GPU preparation)
export LD_LIBRARY_PATH="/soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="/soft/gcc-6.4/lib64:${LD_LIBRARY_PATH}"
2) Prepare a c-shell script (e.g. runjob.sh) as shown in the example below. [[Jobs|More info.]]
3) Make the script executable: chmod +x runjob.sh
4) Submit the job to the cluster: qsub runjob.sh
The script below needs to be adjusted according to your needs, i.e. the user name (#PBS -u), the maximum time a job should run (#PBS -l walltime=<hours>:<min>:<sec>), the working directory and the details of the mdrun command.
Look here for [[groperform|optimising performance]].
* There are two version of Gromacs 2018.2 for GPU and non-GPU located in /soft/gromacs-2018 and /soft/gromacs-2018-gpu
Note that all GPUs attached to the node are used automatically. The maximum walltime is 48 hours.
* For Gromacs 2023.1 both CPU and GPU versions are located in /soft/gromacs-2023.1. Use gmx for GPUs and 32-core compute nodes. Use gmx_mpi for 64 and 96-core compute nodes.
[http://www.phy.bme.hu/~cluster/docs/PBS.html A good tutorial for using PBS]
''Andreas/Hershna''
'''For GPU:'''
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q gpu
#PBS -l nodes=1:ppn=16
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -k oe
#PBS -u hpatel
# runs a job with name 'GromacsTest' on the gpu machine on the cluster
# uses 1 GPU machine
# set a maximum time of forty eight hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'hpatel'
# set required paths:
source /soft/gromacs-2018-gpu/bin/GMXRC
# specify working directory:
cd /home/hpatel/gromacsGPU
export LD_LIBRARY_PATH="soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="/soft/gcc-6.4/lib64:${LD_LIBRARY_PATH}"
### This is the command ###
gmx mdrun -s run.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
--------------
For non-GPU use, Gromacs is optimised for the newer nodes that contain 32 cores. In order to make sure that the job runs on these nodes, you have to request them with #PBS -l nodes=1:ppn=32. An example of a job script is shown below:
'''Without use of GPU:'''
--------------
<pre>#!/bin/sh
#PBS -N GromacsTest
#PBS -q main
#PBS -l nodes=1:ppn=32
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -k oe
#PBS -u hpatel
# runs a job with name 'GromacsTest' on the main cluster
# set a maximum time of forty eight hours (walltime)
# merge 'output' and 'standard error' and output both to 'standard output' (-j oe)
# produced the output, while the job is running (-k oe)
# specifies user 'hpatel'
# set required paths:
source /soft/gromacs-2018/bin/GMXRC
# specify working directory:
cd /home/hpatel/gromacs
export LD_LIBRARY_PATH="soft/mpi/mvapich2-1.6/lib:${LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="/soft/gcc-6.4/lib64:${LD_LIBRARY_PATH}"
### This is the command ###
gmx mdrun -s run.tpr -c after_md.gro -v -stepout 1000
### command end ###
# start with 'qsub runjob.sh'
</pre>
24d16e8af043ffb749debed6605e710b64c30379
Compilers
0
16
717
53
2023-04-29T09:58:09Z
Mjh
2
wikitext
text/x-wiki
The standard C and Fortran compilers are <tt>gcc</tt> and <tt>gfortran</tt>. The Intel compilers <tt>icc</tt> and <tt>ifort</tt> are also available and may be superior for many applications.
By default the gcc and Fortran versions are 4.8.5. These are very old and may not work with modern software.
To access later versions look at <tt>module avail</tt>. Currently gcc-6.4, gcc-10.4 and gcc13.1 are available.
To run software built with these you will need to have the relevant libraries on your LD_LIBRARY_PATH unless you have explicitly loaded the module. E.g. in a job script you might do <tt>setenv LD_LIBRARY_PATH /soft/gcc-10.4.0/lib64</tt> (tcsh) or <tt>export LD_LIBRARY_PATH=/soft/gcc-10.4.0/lib64</tt>.
The same requirements apply to the Intel compilers accessible with <tt>module load intel</tt>, where <tt>/soft/intel/lib/intel64_lin</tt> needs to be on your LD_LIBRARY_PATH.
2ba2356559b064646adbb51c1e4df037c5f1a7bf
719
717
2023-04-29T09:59:31Z
Mjh
2
wikitext
text/x-wiki
The standard C and Fortran compilers are <tt>gcc</tt> and <tt>gfortran</tt>. The Intel compilers <tt>icc</tt> and <tt>ifort</tt> are also available and may be superior for many applications.
By default the gcc and Fortran versions are 4.8.5. These are very old and may not work with modern software.
To access later versions look at <tt>module avail</tt>. Currently gcc-6.4, gcc-10.4 and gcc13.1 are available.
To run software built with these you will need to have the relevant libraries on your LD_LIBRARY_PATH unless you have explicitly loaded the [[Modules|module]]. E.g. in a job script you might do <tt>setenv LD_LIBRARY_PATH /soft/gcc-10.4.0/lib64</tt> (tcsh) or <tt>export LD_LIBRARY_PATH=/soft/gcc-10.4.0/lib64</tt>.
The same requirements apply to the Intel compilers accessible with <tt>module load intel</tt>, where <tt>/soft/intel/lib/intel64_lin</tt> needs to be on your LD_LIBRARY_PATH.
05f25884e67e2fa6974798cce0ed94e6972ac49b
Modules
0
33
718
653
2023-04-29T09:58:52Z
Mjh
2
wikitext
text/x-wiki
The cluster uses the <tt>environment-modules</tt> package to manage environment variable settings for software packages. When a 'module' is loaded, the appropriate environment settings are added to the start of your PATH, MANPATH, LD_LIBRARY_PATH etc, and other variables used by the relevant package may be set as well. When a module is unloaded, the changes made to environment variables are undone.
Documentation of this package is available at this link[http://modules.sourceforge.net/], or type <tt>man module</tt>
Basic commands include:
* <tt>module list</tt>. See what modules you have loaded.
* <tt>module avail</tt>. List what modules are available to you.
* <tt>module load [modulename]</tt>. Load a module, e.g. <tt>module load mpich2-local</tt>
* <tt>module unload [modulename]</tt>. Unload a module.
* <tt>module show [modulename]</tt>. Show what loading a module does.
You may use <tt>module</tt> commands in your .bashrc or .cshrc, e.g. to select your preferred [[MPI]] environment.
Module commands do not work in job scripts or scripts run by jobs because the relevant aliases are only set up by login shells. This means to get the effect of loading a module you should either manually set environment variables as described in <tt>module show</tt> or do
<pre>
eval `/usr/bin/modulecmd [shell] load [module]`
</pre>
where <tt>[shell]</tt> is the name of the shell you are using.
We are happy to add other environments as modules -- please contact the cluster [[Administrators]].
34068dddaf4ddd34b3fb94c198c0031eb6928fd5
GPUs
0
71
722
663
2023-09-04T07:36:15Z
Mjh
2
wikitext
text/x-wiki
Several machines on the cluster have attached NVIDIA GPUs.
* gpu1: The attached GPUs are 6 Tesla K80 units with 16GB VRAM (this machine is out of service)
* gpu2 and gpu3: These both have 3 Tesla V100 units with 16 GB VRAM each on gpu3 and 32 GB VRAM on gpu2.
* gpu4: This has one Tesla V100S and two V100s, with a mixture of 16 GB and 32 GB.
* ramius has a single Tesla K40c.
ramius is a private machine, the other machines are accessible through the gpu queue.
The NVIDIA CUDA software is installed in '/soft/cuda-VERSION' where VERSION is the version number of CUDA. You can do e.g. <tt>module load cuda-10.0</tt> to make sure the libraries and compilers are on your path.
Note:
* At the moment, there is no provision for preventing contention between users of the Tesla unit. That is, if more than one user runs a job that tries to use it at one time, there may be contention and we don't know what effect this will have.
* Equally, there is no provision for preventing contention on the host side -- if you only ask for a few CPU cores and another CPU-bound job takes the remainder, this may (or may not) interfere with your job.
The easiest way to overcome any issues of contention would be to ask for all cores of the host machine, even if you're not going to use them all: then the job submission system will not allow any other jobs to run on the host, so you have exclusive access to the GPU.
The current arrangement in principle allows non-GPU jobs running on a machine with a GPU to block CUDA jobs. If this becomes a problem, we will review the queueing arrangements.
== Tensorflow ==
Tensorflow packages are available under python3.6. Make sure CUDA 10.0 is loaded (see above) and <tt>/soft/python3/usr/local/lib64/python3.6/site-packages</tt> is on your PYTHONPATH e.g. by doing <tt>module load python3 cuda-10.0</tt>. If you are running on a GPU machine you will then get GPU acceleration in Tensorflow.
== Via OpenGL context ==
It is possible for an application to access an OpenGL context on the GPUs using the provided 'headless' X server configuration.
* User needs to start X server:
<pre>
X :42 &
</pre>
where "42" is a free display number, assuming "42" is free. If a number is chosen that is not free, an error message will appear.
* Set the DISPLAY environment variable:
<pre>
export DISPLAY=:42.0
</pre>
where "42" is the free display number as used for starting the X server, assuming the syntax of the BASH shell and .0 denotes the first screen (there are currently four - but using more than one is untested).
* start application, which will need to request an OpenGL context, make use of it, output results unattended and quit after it is done (a render job for example).
Submission of such jobs through the job queue is also untested, so needs to be discussed with the administrators first.
e780fa7456fecf3f23b5e0f4ce8d705cb45a753a
Policies
0
4
723
505
2023-09-21T11:04:39Z
Mjh
2
wikitext
text/x-wiki
The cluster is by design a shared resource. In using it you must be considerate of other users.
Some detailed guidelines are as follows:
* Accounts are for use by the named user only. You must not allow anyone else to use your account.
* The [[architecture|head node]]s must '''never''' be used for computation of any kind on a larger scale than a few minutes' testing or processing of results from the compute nodes.
* The only permitted method of using the compute nodes is by way of the [[jobs|batch queuing system]]. You '''must not''' log in to the compute nodes directly using ssh. If you need an interactive session on a compute node, e.g. for data processing, use the [[interactive jobs]] facility.
* When using the batch queuing system you '''must''' honour the allocations of nodes that your job is given.
* Please use the [[storage]] with consideration to others. We reserve the right in extreme situations to tidy up after you.
* There is a [[fair share]] policy which has been tuned to a certain extent and is operated automatically by the scheduler. Essentially this means that you may find that other people's jobs overtake yours in the queue in an attempt to distribute resources fairly. The intention is that this system will operate only on a timescale of days, though. We do not presently guarantee fairness on timescale of minutes to hours, because to do that we would have to be able to pre-empt jobs that are already running. Please be aware of these limitations of the fair-share policy and work within them.
24003d633fdd76f79afe9e61adbfd18b7695b338
User:Jmcgarry
2
90
726
2023-09-26T10:33:16Z
Mjh
2
Creating user page for new user.
wikitext
text/x-wiki
Cluster Manager / Radio Astronomy PhD student
b621d283c4e2c20e45e38f4201bf916be908a55e
User talk:Jmcgarry
3
91
727
2023-09-26T10:33:16Z
Mjh
2
Welcome!
wikitext
text/x-wiki
'''Welcome to ''Clusterwiki''!'''
We hope you will contribute much and well.
You will probably want to read the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents help pages].
Again, welcome and have fun! [[User:Mjh|Mjh]] ([[User talk:Mjh|talk]]) 10:33, 26 September 2023 (UTC)
41b5f93d0ce19e5df3b28e0d32e062d22a900503
Terms of use
0
77
728
549
2023-10-03T15:24:15Z
Mjh
2
wikitext
text/x-wiki
Use of the UHHPC facility is subject to terms and conditions. To apply for or continue to hold an account you must explicitly agree to these conditions.
* Access to UHHPC is available to three classes of people:
*# Members of the University of Hertfordshire (UH), including undergraduate students, who have a legitimate need to use the cluster facilities. Undergraduate students must apply through an academic supervisor.
*# External collaborators of UH research staff, for work on projects that will directly benefit UH.
*# Members of external consortia who have negotiated an agreement with UH (e.g. [[WEAVE]], [[LOFAR-UK Compute Facility|LOFAR-UK]]).
* Access is conditional on abiding by the [[policies]] that govern use of the cluster. Access may be withdrawn at the discretion of the [[administrators]] in extreme cases.
* Access is given to individuals. Account credentials may not be shared. An individual is solely responsible for the use made of their account.
* UH will store the full name and contact e-mail address of all users. A valid, monitored e-mail address must be on record for all users. Users must notify the [[administrators]] of any change in their contact details. Your details will be stored securely on the cluster itself and will be used both for a cluster mailing list and to contact you in case of problems with your use of the cluster.
* The administrators may take whatever actions they feel necessary for troubleshooting or to ensure the smooth operation and security of the facility, which may include inspecting any data or programs stored on the cluster.
* UH takes no responsibility for backing up users' data. Users store data on UHHPC at their own risk.
* UH makes no guarantee about the level of service provided at any given time.
* Access is provided only while it is needed. Users must notify the [[administrators]] when they no longer need access. Accounts will be deleted according to the [[Account cancellation policy]]. In particular if your registered e-mail address becomes invalid and we cannot contact you we will assume that your account can be deleted.
15326a895c2449a18417b94e5a1fdd81a179bf2e
Star-CCM+
0
50
729
359
2023-10-11T08:46:48Z
Jmcgarry
18
wikitext
text/x-wiki
Star-CCM+ is an engineering package which can be used to solve CFD problems.
This [http://{{SERVERNAME}}/docs/starccm.pdf guide] (kindly written by Vitaly Voloshin) shows you how to use Star-CCM+ on the cluster. Please note that this guide is now fairly old and may not fully reflect the best approach for the current set up.
STAR-CCM+ on the cluster gets its licences from zaxx.stca.herts.ac.uk which is not part of the HPC system and not in the control of the HPC team. Please make sure to properly set your CDLMD_LICENSE_FILE environment variable to access the licence server. This can be done by setting up a .flexlmrc file which contains the line:
<pre>
CDLMD_LICENSE_FILE=1999@zaxx.stca.herts.ac.uk
</pre>
Alternatively, tcsh users can set their environment using:
<pre>
setenv CDLMD_LICENSE_FILE 1999@zaxx.stca.herts.ac.uk
</pre>
The following files are those listed in the guide:
*[http://{{SERVERNAME}}/docs/queue_set.sh queue_set.sh]
*[http://{{SERVERNAME}}/docs/starccm_start.sh starccm_start.sh]
*[http://{{SERVERNAME}}/docs/run.java run.java]
*[http://{{SERVERNAME}}/docs/surf_mesh.java surf_mesh.java]
*[http://{{SERVERNAME}}/docs/sv_mesh.java sv_mesh.java]
*[http://{{SERVERNAME}}/docs/vol_mesh.java vol_mesh.java]
ef3a19942afdd67cf9f116dbfa0ec8ee819fa55c